| Title: | Bioconductor-Friendly Multithreaded BAM Processing |
|---|---|
| Description: | Multithreaded sequential BAM processing built on top of the ompBAM C++ engine. BamScale provides user-friendly BAM read and scan interfaces designed for compatibility with existing Bioconductor workflows. |
| Authors: | Chirag Parsania [aut, cre] |
| Maintainer: | Chirag Parsania <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.99.10 |
| Built: | 2026-05-26 18:04:57 UTC |
| Source: | https://github.com/bioc/BamScale |
bam_count() provides a fast chromosome-level count summary, honoring key
filtering fields from ScanBamParam (mapqFilter, flag, and which).
bam_count( file, param = NULL, threads = 1L, BPPARAM = BiocParallel::bpparam(), auto_threads = FALSE, include_unmapped = TRUE )bam_count( file, param = NULL, threads = 1L, BPPARAM = BiocParallel::bpparam(), auto_threads = FALSE, include_unmapped = TRUE )
file |
BAM input ( |
param |
Optional |
threads |
Requested number of OpenMP threads. May be capped when
|
BPPARAM |
|
auto_threads |
Logical; when |
include_unmapped |
Whether to include an extra |
Parallelism behavior matches bam_read(): BPPARAM distributes work across
BAM files, while threads controls OpenMP work within each file. If
auto_threads = TRUE and BPPARAM has multiple workers, BamScale first
limits the number of concurrently active workers to preserve the requested
per-file thread count within the detected core budget, then caps per-file
OpenMP threads only if a single file would still oversubscribe the machine.
For one file: a data.frame with columns seqname, seqlength, count.
For multiple files: named list of such data.frames.
bam <- ompBAM::example_BAM("Unsorted") bam_count(bam, threads = 2)bam <- ompBAM::example_BAM("Unsorted") bam_count(bam, threads = 2)
bam_read() is a multithreaded sequential BAM reader built on top of
ompBAM. The interface is designed to be familiar to users of
Rsamtools::scanBam(), GenomicAlignments::readGAlignments(), and
GenomicAlignments::readGAlignmentPairs().
bam_read( file, param = NULL, what = NULL, tag = NULL, as = c("DataFrame", "data.frame", "GAlignments", "GAlignmentPairs", "scanBam"), seqqual_mode = c("compatible", "compact"), threads = 1L, BPPARAM = BiocParallel::bpparam(), auto_threads = FALSE, use.names = FALSE, with.which_label = FALSE, include_unmapped = TRUE )bam_read( file, param = NULL, what = NULL, tag = NULL, as = c("DataFrame", "data.frame", "GAlignments", "GAlignmentPairs", "scanBam"), seqqual_mode = c("compatible", "compact"), threads = 1L, BPPARAM = BiocParallel::bpparam(), auto_threads = FALSE, use.names = FALSE, with.which_label = FALSE, include_unmapped = TRUE )
file |
A BAM input. Supported values are:
|
param |
Optional |
what |
Character vector of fields to return, similar to
|
tag |
Character vector of 2-letter tag names to extract. |
as |
Output format:
|
seqqual_mode |
Controls representation of
|
threads |
Requested number of OpenMP threads used for
reading/decompression. May be capped when |
BPPARAM |
|
auto_threads |
Logical; when |
use.names |
Passed to alignment object conversion. When |
with.which_label |
Logical; if |
include_unmapped |
Logical; whether unmapped records are retained
(subject to |
bam_read() is intentionally column-compatible with common BAM fields used by
Bioconductor workflows and can be used as a fast drop-in reader before
conversion to downstream classes.
Parallelism model:
BPPARAM parallelizes across files (one file per BiocParallel worker).
threads parallelizes within each file via OpenMP.
Effective total concurrency is approximately
min(length(file), BiocParallel::bpnworkers(BPPARAM)) * threads.
If auto_threads = TRUE and BPPARAM has multiple workers, BamScale
first limits the number of concurrently active workers to preserve the
requested per-file thread count within the detected core budget, then
caps per-file OpenMP threads only if a single file would still
oversubscribe the machine.
Compatibility notes:
Region filtering via param$which is supported as a sequential filter
(not index-jump random access).
Flag filtering uses ScanBamFlag semantics by converting logical flag
requirements into required-set and required-unset bit masks.
Tag values are returned as character columns. Scalar tags are scalar
strings; B tags are comma-separated vectors.
seqqual_mode = "compact" is optimized for throughput-oriented
benchmarking and returns raw list-columns for seq/qual, not ordinary
sequence or quality strings. In this representation, seq contains
BAM-packed nucleotide bytes and qual contains raw Phred bytes. Compact
output is intended for users who want to defer or avoid full
string-materialization costs; use decode_compact_seq(),
decode_compact_qual(), or decode_seqqual_compact() to decode compact
output back to standard string form when needed.
"GAlignments" and "GAlignmentPairs" output exclude unmapped records.
as = "scanBam" returns a strict scan-like list-of-lists:
without param$which, it returns one unnamed batch; with param$which,
it returns one batch per range label (including empty ranges), with
requested what fields and tag values under $tag.
In this output mode, seq and qual are returned as
Biostrings::DNAStringSet and Biostrings::PhredQuality for closer
scanBam() compatibility.
If file is length 1: one object in the format specified by as.
If file has length > 1 (or is a BamFileList): a named list of outputs,
one per BAM file.
bam <- ompBAM::example_BAM("Unsorted") # Familiar scanBam-like field selection x <- bam_read(bam, what = c("qname", "flag", "rname", "pos", "cigar")) # Include sequence + quality y <- bam_read(bam, what = c("qname", "seq", "qual"), threads = 2) # scanBam-shaped output z <- bam_read(bam, what = c("qname", "flag"), tag = "NM", as = "scanBam")bam <- ompBAM::example_BAM("Unsorted") # Familiar scanBam-like field selection x <- bam_read(bam, what = c("qname", "flag", "rname", "pos", "cigar")) # Include sequence + quality y <- bam_read(bam, what = c("qname", "seq", "qual"), threads = 2) # scanBam-shaped output z <- bam_read(bam, what = c("qname", "flag"), tag = "NM", as = "scanBam")
Decodes qual values returned by bam_read(..., seqqual_mode = "compact")
back to ASCII Phred-quality strings.
decode_compact_qual(qual)decode_compact_qual(qual)
qual |
A list (or list-column) of |
A character vector containing decoded quality strings. Entries with
all-missing quality bytes are returned as "*", matching BamScale's
compatibility mode.
decode_compact_seq(), decode_seqqual_compact(), bam_read()
decode_compact_qual( qual = list(as.raw(c(0L, 1L, 2L, 3L))) )decode_compact_qual( qual = list(as.raw(c(0L, 1L, 2L, 3L))) )
Decodes seq values returned by bam_read(..., seqqual_mode = "compact")
back to ordinary character strings.
decode_compact_seq(seq, qwidth)decode_compact_seq(seq, qwidth)
seq |
A list (or list-column) of |
qwidth |
Integer vector of read widths. This is required because compact sequence bytes use BAM's 4-bit packed encoding (two bases per byte). |
A character vector containing decoded sequence strings.
decode_compact_qual(), decode_seqqual_compact(), bam_read()
decode_compact_seq( seq = list(as.raw(c(0x12, 0x48))), qwidth = 4L )decode_compact_seq( seq = list(as.raw(c(0x12, 0x48))), qwidth = 4L )
seq and qual columns in BamScale outputConvenience wrapper for converting a compact bam_read() result back to
ordinary sequence and quality strings.
decode_seqqual_compact( x, seq_col = "seq", qual_col = "qual", qwidth_col = "qwidth" )decode_seqqual_compact( x, seq_col = "seq", qual_col = "qual", qwidth_col = "qwidth" )
x |
A |
seq_col |
Name of the compact sequence column. |
qual_col |
Name of the compact quality column. |
qwidth_col |
Name of the read-width column used to decode compact sequence bytes. |
x with compact seq and/or qual columns replaced by decoded
character vectors. The input class is preserved.
decode_compact_seq(), decode_compact_qual(), bam_read()
x <- data.frame(qwidth = 4L) x$seq <- I(list(as.raw(c(0x12, 0x48)))) x$qual <- I(list(as.raw(c(0L, 1L, 2L, 3L)))) decode_seqqual_compact(x)x <- data.frame(qwidth = 4L) x$seq <- I(list(as.raw(c(0x12, 0x48)))) x$qual <- I(list(as.raw(c(0L, 1L, 2L, 3L)))) decode_seqqual_compact(x)