| Title: | Introduce *Ranges to bedtools users |
|---|---|
| Description: | Translates bedtools command-line invocations to R code calling functions from the Bioconductor *Ranges infrastructure. This is intended to educate novice Bioconductor users and to compare the syntax and semantics of the two frameworks. |
| Authors: | Michael Lawrence |
| Maintainer: | Michael Lawrence <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.39.0 |
| Built: | 2026-05-30 08:06:21 UTC |
| Source: | https://github.com/bioc/HelloRanges |
HelloRanges uses docopt for parsing the argument string
passed as the cmd argument to functions like
bedtools_intersect. bedtools has its own style of
argument formatting. Here we document the subtle differences.
Here are the specific differences:
docopt requires that multi-character arguments are
prefixed by two hyphens, e.g., --bed. However,
bedtools expects only a single hyphen. It turns out
docopt is robust to the single-hyphen case, except for
the first argument. Since the typical convention is to first
indicate the file, e.g., -a or -i, this
incompatibility does not often arise in practice.
docopt does not allow values of argument to be
space-separated, while bedtools often expects space
separation for multi-valued arguments. As a compromise,
HelloRanges expects the values to be comma-separated. Thus,
-b b.bed c.bed needs to be -b b.bed,c.bed.
Most shells support nested commands within parentheses, e.g.,
-b < (grep foo file.bed), but docopt does not
support that. Instead, nested commands should be enclosed in
double quotes, e.g., -b < "grep foo file.bed". Such
constructs are supported via pipe.
Michael Lawrence
Finds the features in one dataset that are closest to those in another. Supports restriction by strand, upstream, downstream, and overlap. There are several methods for resolving ties. Optionally returns the distance.
bedtools_closest(cmd = "--help") R_bedtools_closest(a, b, s = FALSE, S = FALSE, d = FALSE, D = c("none", "ref", "a", "b"), io = FALSE, iu = FALSE, id = FALSE, fu = FALSE, fd = FALSE, t = c("all", "first", "last"), mdb = c("each", "all"), k = 1L, names = NULL, filenames = FALSE, N = FALSE) do_bedtools_closest(a, b, s = FALSE, S = FALSE, d = FALSE, D = c("none", "ref", "a", "b"), io = FALSE, iu = FALSE, id = FALSE, fu = FALSE, fd = FALSE, t = c("all", "first", "last"), mdb = c("each", "all"), k = 1L, names = NULL, filenames = FALSE, N = FALSE)bedtools_closest(cmd = "--help") R_bedtools_closest(a, b, s = FALSE, S = FALSE, d = FALSE, D = c("none", "ref", "a", "b"), io = FALSE, iu = FALSE, id = FALSE, fu = FALSE, fd = FALSE, t = c("all", "first", "last"), mdb = c("each", "all"), k = 1L, names = NULL, filenames = FALSE, N = FALSE) do_bedtools_closest(a, b, s = FALSE, S = FALSE, d = FALSE, D = c("none", "ref", "a", "b"), io = FALSE, iu = FALSE, id = FALSE, fu = FALSE, fd = FALSE, t = c("all", "first", "last"), mdb = c("each", "all"), k = 1L, names = NULL, filenames = FALSE, N = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
a |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Each feature in |
b |
Like |
s |
Require same strandedness. That is, find the closest feature in
|
S |
Require opposite strandedness. That is, find the closest feature in
|
d |
In addition to the closest feature in |
D |
Like |
io |
Ignore features in |
iu |
Ignore features in |
id |
Ignore features in |
fu |
Choose first from features in |
fd |
Choose first from features in |
t |
Specify how ties for closest feature should be handled. This occurs
when two features in |
mdb |
How multiple databases should be resolved, either “each”
(find closest in each |
k |
Not supported yet. Report the |
names |
When using multiple databases, provide an alias for each to use instead of their integer index. If a single string, can be comma-separated. |
filenames |
When using multiple databases, use their complete filename instead of their integer index. |
N |
Not yet supported, but related use cases are often solved by
passing a single argument to
|
As with all commands, there are three interfaces to the
closest command:
bedtools_closestParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_closestAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_closestEvaluates the result of
R_bedtools_closest. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
The generated code includes calls to utilities like nearest, precede and follow. nearest lacks the ability to restrict its search by direction/overlap, so some complex code is generated to support all of the argument combinations.
Arguments io, iu, id, fu, and fd
require a notion of upstream/downstream to be defined via D,
which accepts one of these values:
Report distance with respect to the reference genome. B features with a lower (start, stop) are “upstream”.
Report distance with respect to A. When A is on the - strand, “upstream” means B has a higher (start,stop).
Report distance with respect to B. When B is on the - strand, “upstream” means A has a higher (start,stop).
A language object containing the compiled R code, evaluating to a
Pairs object with the closest hits from a and b. If
d or D is TRUE, has a metadata column called
“distance”.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/closest.html
nearest-methods for the various ways to find the nearest ranges.
## Not run: setwd(system.file("unitTests", "data", "closest", package="HelloRanges")) ## End(Not run) ## basic bedtools_closest("-a a.bed -b b.bed -d") ## strand-specific bedtools_closest("-a strand-test-a.bed -b strand-test-b.bed -s") ## break ties bedtools_closest("-a close-a.bed -b close-b.bed -t first") ## multiple databases bedtools_closest("-a mq1.bed -b mdb1.bed,mdb2.bed,mdb3.bed -names a,b,c") ## ignoring upstream bedtools_closest("-a d.bed -b d_iu.bed -D ref -iu")## Not run: setwd(system.file("unitTests", "data", "closest", package="HelloRanges")) ## End(Not run) ## basic bedtools_closest("-a a.bed -b b.bed -d") ## strand-specific bedtools_closest("-a strand-test-a.bed -b strand-test-b.bed -s") ## break ties bedtools_closest("-a close-a.bed -b close-b.bed -t first") ## multiple databases bedtools_closest("-a mq1.bed -b mdb1.bed,mdb2.bed,mdb3.bed -names a,b,c") ## ignoring upstream bedtools_closest("-a d.bed -b d_iu.bed -D ref -iu")
Finds regions of the genome that are not covered by a genomic dataset.
bedtools_complement(cmd = "--help") R_bedtools_complement(i, g) do_bedtools_complement(i, g)bedtools_complement(cmd = "--help") R_bedtools_complement(i, g) do_bedtools_complement(i, g)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
g |
A genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
As with all commands, there are three interfaces to the
complement command:
bedtools_complementParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_complementAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_complementEvaluates the result of
R_bedtools_complement. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
The generated code is subtracts, via
setdiff, the ranges from the set of
ranges representing the entire genome.
While it may be tempting to call gaps
instead, it is very unlikely to behave as expected. The GenomicRanges
set operations treat all three strand values (+, -, *) as separate
spaces. gaps takes as its universe the genome on all three
strands, rather than just the “*” strand, resulting in
extraneous stranded ranges.
A language object containing the compiled R code, evaluating to a GRanges object with the complementary ranges.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/complement.html
setops-methods for the various set operations.
## Not run: setwd(system.file("unitTests", "data", "coverage", package="HelloRanges")) ## End(Not run) bedtools_complement("-i a.bed -g test.genome")## Not run: setwd(system.file("unitTests", "data", "coverage", package="HelloRanges")) ## End(Not run) bedtools_complement("-i a.bed -g test.genome")
Compute the coverage of one or more datasets over a set of query ranges.
bedtools_coverage(cmd = "--help") R_bedtools_coverage(a, b, hist = FALSE, d = FALSE, counts = FALSE, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE, g = NA_character_, header = FALSE, sortout = FALSE) do_bedtools_coverage(a, b, hist = FALSE, d = FALSE, counts = FALSE, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE, g = NA_character_, header = FALSE, sortout = FALSE)bedtools_coverage(cmd = "--help") R_bedtools_coverage(a, b, hist = FALSE, d = FALSE, counts = FALSE, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE, g = NA_character_, header = FALSE, sortout = FALSE) do_bedtools_coverage(a, b, hist = FALSE, d = FALSE, counts = FALSE, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE, g = NA_character_, header = FALSE, sortout = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
a |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. The coverage is computed
over these ranges. Use |
b |
Like |
hist |
Report a histogram of coverage for each feature in |
d |
Report the depth at each position in each |
counts |
Only report the count of overlaps, not fraction, etc.
Restricted by |
f |
Minimum overlap required as a fraction of A [default: any overlap]. |
F |
Minimum overlap required as a fraction of B [default: any overlap]. |
r |
Require that the fraction of overlap be reciprocal for |
e |
Require that the minimum fraction be satisfied for |
s |
Force strandedness. That is, only count ranges in |
S |
Require opposite strandedness. That is, count the features in
|
split |
Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects. |
g |
A genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
header |
Ignored. |
sortout |
Sort the result by position. |
As with all commands, there are three interfaces to the
coverage command:
bedtools_coverageParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_coverageAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_coverageEvaluates the result of
R_bedtools_coverage. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
Typically, we compute coverage with
coverage, but features like fractional
overlap restriction and histograms add (educational) complexity. One
key trick is the [,List,GenomicRanges method, which lets us
extract coverage vectors for specific regions (see the generated
code).
A language object containing the compiled R code, evaluating to a GRanges object with coverage information. The exact type of information depends on the mode:
default |
Three metadata columns: “count” (the number of overlapping ranges), “covered” (the number of bases covered in the query), “fraction” (the fraction of bases covered). |
d |
Metadata column “coverage” is an RleList with position-level coverage (depth). This is what we typically refer to as coverage in Bioconductor. |
hist |
Metadata column “coverage” is a list of DataFrames. Each
DataFrame contains a histogram of the coverage depth over that
range, with columns “coverage” (the coverage value),
“count” (the number of positions with that coverage),
“len” (the length of the region, all the same) and
“fraction” (the fraction of positions at that coverage).
There is also a “coverage” component on |
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html
coverage-methods for ways to compute coverage.
## Not run: setwd(system.file("unitTests", "data", "coverage", package="HelloRanges")) ## End(Not run) ## default behavior bedtools_coverage("-a a.bed -b b.bed") ## histogram bedtools_coverage("-a a.bed -b b.bed -hist -g test.genome") ## per-position depth bedtools_coverage("-a a.bed -b b.bed -d -g test.genome")## Not run: setwd(system.file("unitTests", "data", "coverage", package="HelloRanges")) ## End(Not run) ## default behavior bedtools_coverage("-a a.bed -b b.bed") ## histogram bedtools_coverage("-a a.bed -b b.bed -hist -g test.genome") ## per-position depth bedtools_coverage("-a a.bed -b b.bed -d -g test.genome")
Compute flanking regions.
bedtools_flank(cmd = "--help") R_bedtools_flank(i, b = 0, l = 0, r = 0, s = FALSE, pct = FALSE, g = NULL, header = FALSE) do_bedtools_flank(i, b = 0, l = 0, r = 0, s = FALSE, pct = FALSE, g = NULL, header = FALSE)bedtools_flank(cmd = "--help") R_bedtools_flank(i, b = 0, l = 0, r = 0, s = FALSE, pct = FALSE, g = NULL, header = FALSE) do_bedtools_flank(i, b = 0, l = 0, r = 0, s = FALSE, pct = FALSE, g = NULL, header = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
b |
Increase the BED/GFF/VCF range by the same number base pairs in each direction. Integer. |
l |
The number of base pairs to subtract from the start coordinate. Integer. |
r |
The number of base pairs to add to the end coordinate. Integer. |
s |
Define |
pct |
Define |
g |
Genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
header |
Ignored. |
As with all commands, there are three interfaces to the
flank command:
bedtools_flankParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_flankAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_flankEvaluates the result of
R_bedtools_flank. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
We compute the flanks with flank,
although flank only computes one side at a time, so we may call
it multiple times.
A language object containing the compiled R code, evaluating to a GRanges object.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/flank.html
intra-range-methods for flank.
## Not run: setwd(system.file("unitTests", "data", "flank", package="HelloRanges")) ## End(Not run) ## 5 on both sides r <- bedtools_flank("-i a.bed -b 5 -g tiny.genome") ## 5 on left side bedtools_flank("-i a.bed -l 5 -r 0 -g tiny.genome") ## define left/right in terms of transcription direction bedtools_flank("-i a.bed -l 5 -r 0 -s -g tiny.genome")## Not run: setwd(system.file("unitTests", "data", "flank", package="HelloRanges")) ## End(Not run) ## 5 on both sides r <- bedtools_flank("-i a.bed -b 5 -g tiny.genome") ## 5 on left side bedtools_flank("-i a.bed -l 5 -r 0 -g tiny.genome") ## define left/right in terms of transcription direction bedtools_flank("-i a.bed -l 5 -r 0 -s -g tiny.genome")
Compute coverage over the genome. By default, this computes a per-chromosome histogram of the coverage, but options allow for per-position coverage to be returned in different ways.
bedtools_genomecov(cmd = "--help") R_bedtools_genomecov(i, g = NA_character_, d = FALSE, dz = FALSE, bg = FALSE, bga = FALSE, split = FALSE, strand = c("any", "+", "-"), `5` = FALSE, `3` = FALSE, max = NULL, scale = 1, pc = FALSE, fs = NULL) do_bedtools_genomecov(i, g = NA_character_, d = FALSE, dz = FALSE, bg = FALSE, bga = FALSE, split = FALSE, strand = c("any", "+", "-"), `5` = FALSE, `3` = FALSE, max = NULL, scale = 1, pc = FALSE, fs = NULL)bedtools_genomecov(cmd = "--help") R_bedtools_genomecov(i, g = NA_character_, d = FALSE, dz = FALSE, bg = FALSE, bga = FALSE, split = FALSE, strand = c("any", "+", "-"), `5` = FALSE, `3` = FALSE, max = NULL, scale = 1, pc = FALSE, fs = NULL) do_bedtools_genomecov(i, g = NA_character_, d = FALSE, dz = FALSE, bg = FALSE, bga = FALSE, split = FALSE, strand = c("any", "+", "-"), `5` = FALSE, `3` = FALSE, max = NULL, scale = 1, pc = FALSE, fs = NULL)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
g |
Genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
d |
Report the depth at each genome position. This causes a GPos object to be returned. |
dz |
Same as |
bg |
Like |
bga |
Like |
split |
Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects. |
strand |
Calculate coverage of intervals from a specific strand. |
5 |
Calculate coverage of 5' positions (instead of entire interval). |
3 |
Calculate coverage of 3' positions (instead of entire interval). |
max |
Combine all positions with a depth >= |
scale |
Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). |
pc |
Calculates coverage of intervals from left point of a pair reads to the right point. Works for BAM files only. |
fs |
Forces to use the given fragment size instead of read length. |
As with all commands, there are three interfaces to the
genomecov command:
bedtools_genomecovParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_genomecovAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_genomecovEvaluates the result of
R_bedtools_genomecov. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
We typically compute the coverage with
coverage. Computing the histogram
requires more work.
A language object containing the compiled R code, evaluating to an object that depends on the mode:
default |
A DataFrame that is a per-chromosome histogram of the coverage that includes the whole genome margin. Includes columns “seqnames” (the chromosome name, or “genome”), “coverage” (the coverage value), “count” (the count of positions covered at that value), “len” (the length of the chromosome/genome), “fraction” (the fraction of bases covered at the value). |
d, dz
|
A GPos object with per-position coverage values. |
bg, bga
|
A GRanges object with coverage runs. |
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html
intra-range-methods for genomecov.
## Not run: setwd(system.file("unitTests", "data", "genomecov", package="HelloRanges")) ## End(Not run) ## get coverage runs as a GRanges bedtools_genomecov("-i y.bed -bg -g test.genome") ## get coverage depth as a GPos, dropping zero values, ignore junctions bedtools_genomecov("-i three_blocks.bam -dz -split") ## custom fragment size bedtools_genomecov("-i chip.bam -bg -fs 100")## Not run: setwd(system.file("unitTests", "data", "genomecov", package="HelloRanges")) ## End(Not run) ## get coverage runs as a GRanges bedtools_genomecov("-i y.bed -bg -g test.genome") ## get coverage depth as a GPos, dropping zero values, ignore junctions bedtools_genomecov("-i three_blocks.bam -dz -split") ## custom fragment size bedtools_genomecov("-i chip.bam -bg -fs 100")
Query sequence from a FASTA file given a set of ranges, including compound regions like transcripts and junction reads. This assumes the sequence is DNA.
bedtools_getfasta(cmd = "--help") R_bedtools_getfasta(fi, bed, s = FALSE, split = FALSE) do_bedtools_getfasta(fi, bed, s = FALSE, split = FALSE)bedtools_getfasta(cmd = "--help") R_bedtools_getfasta(fi, bed, s = FALSE, split = FALSE) do_bedtools_getfasta(fi, bed, s = FALSE, split = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
fi |
Path to a FASTA file, or an XStringSet object. |
bed |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges, as the query. Use
|
s |
Force strandedness. If the feature occupies the antisense strand, the sequence will be reverse complemented. |
split |
Given BED12 or BAM input, extract and concatenate the sequences from the blocks (e.g., exons). |
As with all commands, there are three interfaces to the
getfasta command:
bedtools_getfastaParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_getfastaAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_getfastaEvaluates the result of
R_bedtools_getfasta. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
It is recommended to retrieve reference sequence using a
BSgenome package, either custom or provided by
Bioconductor. Call getSeq to query for
specific regions of the BSgenome object. If one must access a file,
consider converting it to 2bit or FA (razip) format for indexed
access using import and its which
argument.
But if one must access a FASTA file, we need to read all of it with
readDNAStringSet and extract regions using
x[gr], where gr is a GRanges or GRangesList.
A language object containing the compiled R code, evaluating to a DNAStringSet object.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html
getSeq, the primary sequence query interface.
## Not run: setwd(system.file("unitTests", "data", "getfasta", package="HelloRanges")) ## End(Not run) ## simple query bedtools_getfasta("--fi t.fa -bed blocks.bed") ## get spliced transcript/read sequence bedtools_getfasta("--fi t.fa -bed blocks.bed -split")## Not run: setwd(system.file("unitTests", "data", "getfasta", package="HelloRanges")) ## End(Not run) ## simple query bedtools_getfasta("--fi t.fa -bed blocks.bed") ## get spliced transcript/read sequence bedtools_getfasta("--fi t.fa -bed blocks.bed -split")
Query sequence from a FASTA file given a set of ranges, including compound regions like transcripts and junction reads. This assumes the sequence is DNA.
bedtools_groupby(cmd = "--help") R_bedtools_groupby(i, g = 1:3, c, o = "sum", delim=",") do_bedtools_groupby(i, g = 1:3, c, o = "sum", delim=",")bedtools_groupby(cmd = "--help") R_bedtools_groupby(i, g = 1:3, c, o = "sum", delim=",") do_bedtools_groupby(i, g = 1:3, c, o = "sum", delim=",")
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
g |
Column index(es) for grouping the input. Columns may be comma-separated. By default, the grouping is by range. |
c |
Specify columns (by integer index) from the input file to operate
upon (see |
o |
Specify the operations (by name) that should be applied to the
columns indicated in |
delim |
Delimiter character used to collapse strings. |
As with all commands, there are three interfaces to the
groupby command:
bedtools_groupbyParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_groupbyAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_groupbyEvaluates the result of
R_bedtools_groupby. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
The workhorse for aggregation in R is
aggregate and we have extended its
interface to make it more convenient. See
aggregate for details.
The following operations are supported (with R translation):
sum(X)
min(X)
max(X)
min(abs(X))
max(abs(X))
mean(X)
median(X)
distmode(X)
distmode(X, anti=TRUE)
unstrsplit(X, delim)
unstrsplit(unique(X), delim)
lengths(X)
lengths(unique(X))
sd(X)
freqtable(X)
firstdrop(heads(X, 1L))
lastdrop(tails(X, 1L))
For the sake of simplicity, and because the use cases are not clear, we do not support aggregation of every column. Here are some of the restrictions:
No support for the last column of GFF (the ragged list of attributes).
No support for the INFO, FORMAT and GENO fields of VCF.
No support for the FLAG field of BAM (bedtools does
not support this either).
A language object containing the compiled R code, generally evaluating to a DataFrame, with a column for each grouping variable and each summarized variable. As a special case, if there are no grouping variables specified, then the grouping is by range, and an aggregated GRanges is returned.
We admit that using column subscripts for c makes code hard
to read. All the more reason to just write R code.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/groupby.html
aggregate-methods for general aggregation.
## Not run: setwd(system.file("unitTests", "data", "groupby", package="HelloRanges")) ## End(Not run) ## aggregation by range bedtools_groupby("-i values3.header.bed -c 5") ## average variant qualities by chromosome and reference base ## Not run: indexTabix(bgzip("a_vcfSVtest.vcf", overwrite=TRUE), "vcf") ## End(Not run) bedtools_groupby("-i a_vcfSVtest.vcf.bgz -g 1,4 -c 6 -o mean")## Not run: setwd(system.file("unitTests", "data", "groupby", package="HelloRanges")) ## End(Not run) ## aggregation by range bedtools_groupby("-i values3.header.bed -c 5") ## average variant qualities by chromosome and reference base ## Not run: indexTabix(bgzip("a_vcfSVtest.vcf", overwrite=TRUE), "vcf") ## End(Not run) bedtools_groupby("-i a_vcfSVtest.vcf.bgz -g 1,4 -c 6 -o mean")
Finds and/or counts the intersections between two ranged datasets.
bedtools_intersect(cmd = "--help") R_bedtools_intersect(a, b, ubam = FALSE, bed = FALSE, wa = FALSE, wb = FALSE, loj = FALSE, wo = FALSE, wao = FALSE, u = FALSE, c = FALSE, v = FALSE, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE, g = NA_character_, header = FALSE, names = NULL, filenames = FALSE, sortout = FALSE) do_bedtools_intersect(a, b, ubam = FALSE, bed = FALSE, wa = FALSE, wb = FALSE, loj = FALSE, wo = FALSE, wao = FALSE, u = FALSE, c = FALSE, v = FALSE, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE, g = NA_character_, header = FALSE, names = NULL, filenames = FALSE, sortout = FALSE)bedtools_intersect(cmd = "--help") R_bedtools_intersect(a, b, ubam = FALSE, bed = FALSE, wa = FALSE, wb = FALSE, loj = FALSE, wo = FALSE, wao = FALSE, u = FALSE, c = FALSE, v = FALSE, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE, g = NA_character_, header = FALSE, names = NULL, filenames = FALSE, sortout = FALSE) do_bedtools_intersect(a, b, ubam = FALSE, bed = FALSE, wa = FALSE, wb = FALSE, loj = FALSE, wo = FALSE, wao = FALSE, u = FALSE, c = FALSE, v = FALSE, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE, g = NA_character_, header = FALSE, names = NULL, filenames = FALSE, sortout = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
a |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Each feature in |
b |
Like |
ubam |
Not supported yet. Write uncompressed BAM output. The default is write compressed BAM output. |
bed |
When using BAM input, return a GRanges (with a “blocks” column) instead of a GAlignments. For VCF input, return a VRanges instead of a VCF object. |
wa |
Return the original entry in |
wb |
Return the original entry in |
loj |
Perform a ‘left outer join’. That is, for each feature in
|
wo |
Return the number of base pairs of
overlap between the two features as the “overlap_width”
metadata column. Implies |
wao |
Like |
u |
Like |
c |
Like |
v |
Like |
f |
Minimum overlap required as a fraction of |
F |
Minimum overlap required as a fraction of |
r |
Require that the fraction of overlap be reciprocal for |
e |
Require that the minimum fraction be satisfied for |
s |
Require same strandedness. That is, find the intersect feature in
|
S |
Require opposite strandedness. That is, find the intersect feature in
|
split |
Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects. |
g |
A genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
header |
Ignored. |
names |
When using multiple databases, provide an alias for each to use instead of their integer index. If a single string, can be comma-separated. |
filenames |
When using multiple databases, use their complete filename instead of their integer index. |
sortout |
Sort the result by genomic coordinate. |
As with all commands, there are three interfaces to the
intersect command:
bedtools_intersectParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_intersectAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_intersectEvaluates the result of
R_bedtools_intersect. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
This is by far the most complex bedtools command, and it
offers a dizzying list of parameters, many of which are redundant or
mutually exclusive. The complexity of the generated code is highest
when using the fractional restriction feature, since no such support
exists in the GenomicRanges overlap routines.
A language object containing the compiled R code, evaluating to a
ranges object, the exact type of which depends on the arguments. If
both wa and wb are TRUE, return a Pairs object
with the original, matched up ranges, possibly with metadata
columns. By default, the return value is a GAlignments for BAM input,
a VCF object for VCF input, or a GRanges for any other type of input.
If bed is TRUE, BAM input is converted to a GRanges,
containing a “blocks” column (encoding the junctions) if the
input is BAM. If the input is VCF, bed=TRUE converts the input
to a VRanges.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html
setops-methods for set operations including intersect, findOverlaps-methods for different ways to detect overlaps.
## Not run: setwd(system.file("unitTests", "data", "intersect", package="HelloRanges")) ## End(Not run) ## return intersecting ranges bedtools_intersect("-a a.bed -b a.bed") ## add count bedtools_intersect("-a a.bed -b b.bed -c") ## restrict by strand and fraction of overlap bedtools_intersect("-a a.bed -b b.bed -c -s -f 0.1") ## return original 'a' ranges bedtools_intersect("-a a.bed -b b.bed -wa") ## return both 'a' and 'b' ranges, along with overlap widths bedtools_intersect("-a a.bed -b b.bed -wo") ## same as above, except left outer join bedtools_intersect("-a a.bed -b b.bed -wao") ## consider read junction structure bedtools_intersect("-a three_blocks.bam -b three_blocks_nomatch.bed -split")## Not run: setwd(system.file("unitTests", "data", "intersect", package="HelloRanges")) ## End(Not run) ## return intersecting ranges bedtools_intersect("-a a.bed -b a.bed") ## add count bedtools_intersect("-a a.bed -b b.bed -c") ## restrict by strand and fraction of overlap bedtools_intersect("-a a.bed -b b.bed -c -s -f 0.1") ## return original 'a' ranges bedtools_intersect("-a a.bed -b b.bed -wa") ## return both 'a' and 'b' ranges, along with overlap widths bedtools_intersect("-a a.bed -b b.bed -wo") ## same as above, except left outer join bedtools_intersect("-a a.bed -b b.bed -wao") ## consider read junction structure bedtools_intersect("-a three_blocks.bam -b three_blocks_nomatch.bed -split")
Compare two sets of genomic regions using the Jaccard statistic, defined as the total width of the intersection, divided by the total width of the union.
bedtools_jaccard(cmd = "--help") R_bedtools_jaccard(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE) do_bedtools_jaccard(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE)bedtools_jaccard(cmd = "--help") R_bedtools_jaccard(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE) do_bedtools_jaccard(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, split = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
a |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
b |
Like |
f |
Minimum overlap required as a fraction of |
F |
Minimum overlap required as a fraction of |
r |
Require that the fraction of overlap be reciprocal for |
e |
Require that the minimum fraction be satisfied for |
s |
Require same strandedness. That is, find the jaccard feature in
|
S |
Require opposite strandedness. That is, find the jaccard feature in
|
split |
Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects. |
As with all commands, there are three interfaces to the
jaccard command:
bedtools_jaccardParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_jaccardAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_jaccardEvaluates the result of
R_bedtools_jaccard. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
This is mostly just intersect and
union, except when fractional overlap
restrictions are involved.
A language object containing the compiled R code, evaluating to a a DataFrame with four columns:
intersection |
total width of intersection |
union |
total width of union |
jaccard |
the jaccard statistic |
n_intersections |
the number of ranges representing the intersection |
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/jaccard.html
setops-methods for set operations including intersect and union.
## Not run: setwd(system.file("unitTests", "data", "jaccard", package="HelloRanges")) ## End(Not run) ## basic bedtools_jaccard("-a a.bed -b a.bed") ## excluding the gaps in compound ranges bedtools_jaccard("-a three_blocks_match.bed -b e.bed -split") ## strand and fractional overlap restriction bedtools_jaccard("-a aMixedStrands.bed -b bMixedStrands.bed -s -f 0.8")## Not run: setwd(system.file("unitTests", "data", "jaccard", package="HelloRanges")) ## End(Not run) ## basic bedtools_jaccard("-a a.bed -b a.bed") ## excluding the gaps in compound ranges bedtools_jaccard("-a three_blocks_match.bed -b e.bed -split") ## strand and fractional overlap restriction bedtools_jaccard("-a aMixedStrands.bed -b bMixedStrands.bed -s -f 0.8")
Generate a partitioning/tiling or set of sliding windows over the genome or a set of ranges.
bedtools_makewindows(cmd = "--help") R_bedtools_makewindows(b, g = NA_character_, w, s, n) do_bedtools_makewindows(b, g = NA_character_, w, s, n)bedtools_makewindows(cmd = "--help") R_bedtools_makewindows(b, g = NA_character_, w, s, n) do_bedtools_makewindows(b, g = NA_character_, w, s, n)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
b |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
g |
A genome file, identifier or Seqinfo object that defines the order
and size of the sequences. Specifying this generates windows over
the genome. Exclusive with |
w |
Window size, exclusive with |
s |
Step size (generates sliding windows). |
n |
Number of windows, exclusive with |
As with all commands, there are three interfaces to the
makewindows command:
bedtools_makewindowsParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_makewindowsAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_makewindowsEvaluates the result of
R_bedtools_makewindows. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
We view the generation of a partitioning (or tiling) as a distinct use
case from the generation of sliding windows. The two use cases
correspond to the tile and
slidingWindows functions, respectively.
A language object containing the compiled R code, evaluating to a a GRangesList containing the windows for each range (or chromosome).
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/makewindows.html
tile-methods for generating windows.
## Not run: setwd(system.file("unitTests", "data", "makewindows", package="HelloRanges")) ## End(Not run) ## tiles of width 5000 bedtools_makewindows("-b input.bed -w 5000") ## sliding windows, 5kb wide, every 2kb bedtools_makewindows("-b input.bed -w 5000 -s 2000") ## 3 tiles in each range bedtools_makewindows("-b input.bed -n 3") ## 3 tiles for each chromosome of the genome bedtools_makewindows("-g test.genome -n 3")## Not run: setwd(system.file("unitTests", "data", "makewindows", package="HelloRanges")) ## End(Not run) ## tiles of width 5000 bedtools_makewindows("-b input.bed -w 5000") ## sliding windows, 5kb wide, every 2kb bedtools_makewindows("-b input.bed -w 5000 -s 2000") ## 3 tiles in each range bedtools_makewindows("-b input.bed -n 3") ## 3 tiles for each chromosome of the genome bedtools_makewindows("-g test.genome -n 3")
Group ranges by overlap with query ranges and aggregate. By default, the scores are summed.
bedtools_map(cmd = "--help") R_bedtools_map(a, b, c = "5", o = "sum", f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, header = FALSE, split = FALSE, g = NA_character_, delim=",") do_bedtools_map(a, b, c = "5", o = "sum", f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, header = FALSE, split = FALSE, g = NA_character_, delim=",")bedtools_map(cmd = "--help") R_bedtools_map(a, b, c = "5", o = "sum", f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, header = FALSE, split = FALSE, g = NA_character_, delim=",") do_bedtools_map(a, b, c = "5", o = "sum", f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, header = FALSE, split = FALSE, g = NA_character_, delim=",")
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
a |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
b |
Like |
c |
Specify columns (by integer index) from the input file to operate
upon (see |
o |
Specify the operations (by name) that should be applied to the
columns indicated in |
f |
Minimum overlap required as a fraction of |
F |
Minimum overlap required as a fraction of |
r |
Require that the fraction of overlap be reciprocal for |
e |
Require that the minimum fraction be satisfied for |
s |
Require same strandedness. That is, find the jaccard feature in
|
S |
Require opposite strandedness. That is, find the jaccard feature in
|
header |
Ignored. |
split |
Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects. |
g |
A genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
delim |
Delimiter character used to collapse strings. |
As with all commands, there are three interfaces to the
map command:
bedtools_mapParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_mapAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_mapEvaluates the result of
R_bedtools_map. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
Computing overlaps with findOverlaps
generates a Hits object, which we can pass directly
to aggregate to aggregate the subject
features that overlap the same range in the query.
There are several commands in the bedtools suite that might
be approximately implemented by passing multiple files to b and
specifying the aggregate expression table(b). That counts how
many ranges from each database/sample overlap a given query. The
covered commands are: bedtools annotate -counts,
bedtools multicov and bedtools tag.
A language object containing the compiled R code, evaluating to a
DataFrame with a “grouping” column corresponding to
as(hits, "List"), and a column for each summary.
We do not support the bedtools null argument, because
it seems more sensible to just let R decide on the value of statistics
when a group is empty.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/map.html
findOverlaps-methods for finding hits, Hits-class for manipulating them, aggregate-methods for aggregating them.
## Not run: setwd(system.file("unitTests", "data", "map", package="HelloRanges")) ## End(Not run) ## default behavior bedtools_map("-a ivls.bed -b values.bed") ## take the mode of the scores bedtools_map("-a ivls.bed -b values.bed -o mode") ## collapse the chromosome names bedtools_map("-a ivls.bed -b test.gff2 -c 1 -o collapse") ## collapse the names, restricted by fractional overlap bedtools_map("-a ivls2.bed -b values5.bed -c 4 -o collapse -f 0.7")## Not run: setwd(system.file("unitTests", "data", "map", package="HelloRanges")) ## End(Not run) ## default behavior bedtools_map("-a ivls.bed -b values.bed") ## take the mode of the scores bedtools_map("-a ivls.bed -b values.bed -o mode") ## collapse the chromosome names bedtools_map("-a ivls.bed -b test.gff2 -c 1 -o collapse") ## collapse the names, restricted by fractional overlap bedtools_map("-a ivls2.bed -b values5.bed -c 4 -o collapse -f 0.7")
Collapse overlapping and adjacent ranges into a single range, i.e.,
reduce the ranges. Then, group the original ranges by
reduced range and aggregate. By default, the scores are summed.
bedtools_merge(cmd = "--help") R_bedtools_merge(i, s = FALSE, S = c("any", "+", "-"), d = 0L, c = NULL, o = "sum", delim = ",") do_bedtools_merge(i, s = FALSE, S = c("any", "+", "-"), d = 0L, c = NULL, o = "sum", delim = ",")bedtools_merge(cmd = "--help") R_bedtools_merge(i, s = FALSE, S = c("any", "+", "-"), d = 0L, c = NULL, o = "sum", delim = ",") do_bedtools_merge(i, s = FALSE, S = c("any", "+", "-"), d = 0L, c = NULL, o = "sum", delim = ",")
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
s |
Require same strandedness. That is, find the jaccard feature in
|
S |
Force merge for one specific strand only. Follow with + or - to force merge from only the forward or reverse strand, respectively. By default, merging is done without respect to strand. |
d |
Maximum distance between features allowed for features to be merged. Default is 0. That is, overlapping and/or book-ended features are merged. |
c |
Specify columns (by integer index) from the input file to operate
upon (see |
o |
Specify the operations (by name) that should be applied to the
columns indicated in |
delim |
Delimiter character used to collapse strings. |
As with all commands, there are three interfaces to the
merge command:
bedtools_mergeParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_mergeAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_mergeEvaluates the result of
R_bedtools_merge. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
The workhorse for reduction is
reduce. Passing with.revmap=TRUE
to reduce causes it to return a list of integers, which can be
passed directly to aggregate to aggregate the
original ranges.
Since the grouping information is preserved in the result, this
function serves as a proxy for bedtools cluster.
A language object containing the compiled R code, evaluating to a
DataFrame with a “grouping” column corresponding to
as(hits, "List"), and a column for each summary.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/merge.html
bedtools_groupby for more details on bedtools-style aggregation, reduce for merging, aggregate-methods for aggregating.
## Not run: setwd(system.file("unitTests", "data", "merge", package="HelloRanges")) ## End(Not run) ## default behavior, sum the score bedtools_merge("-i a.bed") ## count the seqnames bedtools_merge("-i a.bed -c 1 -o count") ## collapse the names using "|" as the delimiter bedtools_merge("-i a.names.bed -delim \"|\" -c 4 -o collapse") ## collapse the names and sum the scores bedtools_merge("-i a.full.bed -c 4,5 -o collapse,sum") ## count and sum the scores bedtools_merge("-i a.full.bed -c 5 -o count,sum") ## only merge the positive strand features bedtools_merge("-i a.full.bed -S +")## Not run: setwd(system.file("unitTests", "data", "merge", package="HelloRanges")) ## End(Not run) ## default behavior, sum the score bedtools_merge("-i a.bed") ## count the seqnames bedtools_merge("-i a.bed -c 1 -o count") ## collapse the names using "|" as the delimiter bedtools_merge("-i a.names.bed -delim \"|\" -c 4 -o collapse") ## collapse the names and sum the scores bedtools_merge("-i a.full.bed -c 4,5 -o collapse,sum") ## count and sum the scores bedtools_merge("-i a.full.bed -c 5 -o count,sum") ## only merge the positive strand features bedtools_merge("-i a.full.bed -S +")
Summarize the ranges according to disjoin
and annotate each disjoint range with the samples that overlap the
range.
bedtools_multiinter(cmd = "--help") R_bedtools_multiinter(i, header=FALSE, names=NULL, g=NA_character_, empty=FALSE) do_bedtools_multiinter(i, header=FALSE, names=NULL, g=NA_character_, empty=FALSE)bedtools_multiinter(cmd = "--help") R_bedtools_multiinter(i, header=FALSE, names=NULL, g=NA_character_, empty=FALSE) do_bedtools_multiinter(i, header=FALSE, names=NULL, g=NA_character_, empty=FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Paths to BAM/BED/GFF/VCF/etc files (vector or comma-separated), or a list of objects. |
header |
Ignored. |
names |
Provide an alias for each to use for each |
g |
A genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
empty |
Report empty regions (i.e., regions not covered in any of the
files). This essentially yields a partitioning of the genome (and
thus requires |
As with all commands, there are three interfaces to the
multiinter command:
bedtools_multiinterParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_multiinterAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_multiinterEvaluates the result of
R_bedtools_multiinter. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
The workhorse is
disjoin. Passing with.revmap=TRUE
to disjoin causes it to return a list of integers, which we use
to extract the sample identifiers. The empty case requires a
bit more code, because we have to combine the disjoint ranges with the
gaps.
A language object containing the compiled R code, evaluating to a GRanges with a column “i” indicating the sample memberships.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/multiinter.html
disjoin for forming disjoint ranges.
## Not run: setwd(system.file("unitTests", "data", "multiinter", package="HelloRanges")) ## End(Not run) ## default behavior bedtools_multiinter("-i a.bed,b.bed,c.bed") ## custom names bedtools_multiinter("-i a.bed,b.bed,c.bed -names A,B,C") ## include empty regions, i.e., partition the genome bedtools_multiinter("-i a.bed,b.bed,c.bed -names A,B,C -empty -g test.genome")## Not run: setwd(system.file("unitTests", "data", "multiinter", package="HelloRanges")) ## End(Not run) ## default behavior bedtools_multiinter("-i a.bed,b.bed,c.bed") ## custom names bedtools_multiinter("-i a.bed,b.bed,c.bed -names A,B,C") ## include empty regions, i.e., partition the genome bedtools_multiinter("-i a.bed,b.bed,c.bed -names A,B,C -empty -g test.genome")
Summarize DNA sequences over the specified ranges.
bedtools_nuc(cmd = "--help") R_bedtools_nuc(fi, bed, s = FALSE, pattern = NULL, fullHeader = FALSE) do_bedtools_nuc(fi, bed, s = FALSE, pattern = NULL, fullHeader = FALSE)bedtools_nuc(cmd = "--help") R_bedtools_nuc(fi, bed, s = FALSE, pattern = NULL, fullHeader = FALSE) do_bedtools_nuc(fi, bed, s = FALSE, pattern = NULL, fullHeader = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
fi |
Path to a FASTA file, or an XStringSet. |
bed |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges, as the query. Use
|
s |
Force strandedness. If the feature occupies the antisense strand, the sequence will be reverse complemented. |
pattern |
Optional sequence pattern to count in each subsequence. |
fullHeader |
Use the full FASTA header as the names. By default, use just the first word. |
As with all commands, there are three interfaces to the
nuc command:
bedtools_nucParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_nucAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_nucEvaluates the result of
R_bedtools_nuc. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
Computes AT/GC percentage and counts each type of base. Relies on
Biostrings utilities like letterFrequency
and alphabetFrequency. The counting of
pattern occurrences uses
vcountPattern.
A language object containing the compiled R code, evaluating to a
DataFrame with summary statistics including the AC and GT
percentage, and the counts of each type of base. Also includes the
count of pattern, if specified.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/nuc.html
letterFrequency for summarizing sequences, matchPattern for pattern matching.
## Not run: setwd(system.file("unitTests", "data", "nuc", package="HelloRanges")) ## End(Not run) ## default behavior, note the two dashes in '--fi' bedtools_nuc("--fi test.fasta -bed a.bed") ## with pattern counting bedtools_nuc("--fi test.fasta -bed a.bed -pattern ATA")## Not run: setwd(system.file("unitTests", "data", "nuc", package="HelloRanges")) ## End(Not run) ## default behavior, note the two dashes in '--fi' bedtools_nuc("--fi test.fasta -bed a.bed") ## with pattern counting bedtools_nuc("--fi test.fasta -bed a.bed -pattern ATA")
Compute shifting regions.
bedtools_shift(cmd = "--help") R_bedtools_shift(i, s = 0, m = 0, p = 0, pct = FALSE, g = NULL, header = FALSE) do_bedtools_shift(i, s = 0, m = 0, p = 0, pct = FALSE, g = NULL, header = FALSE)bedtools_shift(cmd = "--help") R_bedtools_shift(i, s = 0, m = 0, p = 0, pct = FALSE, g = NULL, header = FALSE) do_bedtools_shift(i, s = 0, m = 0, p = 0, pct = FALSE, g = NULL, header = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
s |
Amount to shift all features. |
m |
Amount to shift negative strand features. |
p |
Amount to shift positive strand features. |
pct |
Define |
g |
Genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
header |
Ignored. |
As with all commands, there are three interfaces to the
shift command:
bedtools_shiftParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_shiftAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_shiftEvaluates the result of
R_bedtools_shift. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
This is a fairly straight-forward application of
shift.
A language object containing the compiled R code, evaluating to a GRanges, or similar, object. In principle, this is an endomorphism.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/shift.html
intra-range-methods for shift.
## Not run: setwd(system.file("unitTests", "data", "shift", package="HelloRanges")) ## End(Not run) ## shift all ranges by 5 bedtools_shift("-i a.bed -s 5 -g tiny.genome") ## shift only the negative strand features by 5 bedtools_shift("-i a.bed -p 0 -m 5 -g tiny.genome")## Not run: setwd(system.file("unitTests", "data", "shift", package="HelloRanges")) ## End(Not run) ## shift all ranges by 5 bedtools_shift("-i a.bed -s 5 -g tiny.genome") ## shift only the negative strand features by 5 bedtools_shift("-i a.bed -p 0 -m 5 -g tiny.genome")
Widen ranges on the left and/or right side.
bedtools_slop(cmd = "--help") R_bedtools_slop(i, b = 0, l = 0, r = 0, s = FALSE, pct = FALSE, g = NULL, header = FALSE) do_bedtools_slop(i, b = 0, l = 0, r = 0, s = FALSE, pct = FALSE, g = NULL, header = FALSE)bedtools_slop(cmd = "--help") R_bedtools_slop(i, b = 0, l = 0, r = 0, s = FALSE, pct = FALSE, g = NULL, header = FALSE) do_bedtools_slop(i, b = 0, l = 0, r = 0, s = FALSE, pct = FALSE, g = NULL, header = FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
b |
Widen the same number base pairs in each direction. |
l |
The number of base pairs to subtract from the start coordinate. |
r |
The number of base pairs to add to the end coordinate. |
s |
Define |
pct |
Define |
g |
Genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
header |
Ignored. |
As with all commands, there are three interfaces to the
slop command:
bedtools_slopParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_slopAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_slopEvaluates the result of
R_bedtools_slop. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
This is a fairly straight-forward application of
resize and the + operator on
GRanges.
A language object containing the compiled R code, evaluating to a GRanges, or similar, object. In principle, this is an endomorphism.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/slop.html
intra-range-methods for resize.
## Not run: setwd(system.file("unitTests", "data", "slop", package="HelloRanges")) ## End(Not run) ## widen on both ends bedtools_slop("-i a.bed -b 5 -g tiny.genome") ## widen only on the left end bedtools_slop("-i a.bed -l 5 -r 0 -g tiny.genome")## Not run: setwd(system.file("unitTests", "data", "slop", package="HelloRanges")) ## End(Not run) ## widen on both ends bedtools_slop("-i a.bed -b 5 -g tiny.genome") ## widen only on the left end bedtools_slop("-i a.bed -l 5 -r 0 -g tiny.genome")
Subtracts one set of ranges from another, either by position or range.
bedtools_subtract(cmd = "--help") R_bedtools_subtract(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, A = FALSE, N = FALSE, g = NA_character_) do_bedtools_subtract(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, A = FALSE, N = FALSE, g = NA_character_)bedtools_subtract(cmd = "--help") R_bedtools_subtract(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, A = FALSE, N = FALSE, g = NA_character_) do_bedtools_subtract(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE, s = FALSE, S = FALSE, A = FALSE, N = FALSE, g = NA_character_)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
a |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Each feature in |
b |
Like |
f |
Minimum overlap required as a fraction of |
F |
Minimum overlap required as a fraction of |
r |
Require that the fraction of overlap be reciprocal for |
e |
Require that the minimum fraction be satisfied for |
s |
Require same strandedness. That is, find the subtract feature in
|
S |
Require opposite strandedness. That is, find the subtract feature in
|
A |
Remove entire feature if any overlap. If a feature in |
N |
Same as |
g |
A genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
As with all commands, there are three interfaces to the
subtract command:
bedtools_subtractParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_subtractAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_subtractEvaluates the result of
R_bedtools_subtract. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
We typically subtract sets of ranges using
setdiff; however, that will not work
here, because we cannot merge the ranges in a.
The algorithm has two modes: by position (where ranges are clipped)
and by range (where ranges are discarded entirely). The position mode
is the default. We find overlaps, optionally restrict them, and for
each range in a, we subtract all of the qualifying
intersections in b.
When A or N are TRUE, we use the second mode. In
the simplest case, that is just
subsetByOverlaps with invert=TRUE,
but fractional overlap restrictions and N make that more
complicated.
A language object containing the compiled R code, evaluating to a
GRanges object, except when A or N are TRUE,
where the value might be a GRanges, GAlignments or VCF object,
depending on the input.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/subtract.html
setops-methods for set operations including setdiff, findOverlaps-methods for different ways to detect overlaps.
## Not run: setwd(system.file("unitTests", "data", "subtract", package="HelloRanges")) ## End(Not run) ## simple case, position-wise subtraction bedtools_subtract("-a a.bed -b b.bed") ## fractional overlap restriction bedtools_subtract("-a a.bed -b b.bed -f 0.5") ## range-wise subtraction bedtools_subtract("-a a.bed -b b.bed -A -f 0.5")## Not run: setwd(system.file("unitTests", "data", "subtract", package="HelloRanges")) ## End(Not run) ## simple case, position-wise subtraction bedtools_subtract("-a a.bed -b b.bed") ## fractional overlap restriction bedtools_subtract("-a a.bed -b b.bed -f 0.5") ## range-wise subtraction bedtools_subtract("-a a.bed -b b.bed -A -f 0.5")
Summarize the ranges according to disjoin
and construct a matrix of scores (disjoint range by
sample/file). Empty cells are filled with NA.
bedtools_unionbedg(cmd = "--help") R_bedtools_unionbedg(i, header=FALSE, names=NULL, g=NA_character_, empty=FALSE) do_bedtools_unionbedg(i, header=FALSE, names=NULL, g=NA_character_, empty=FALSE)bedtools_unionbedg(cmd = "--help") R_bedtools_unionbedg(i, header=FALSE, names=NULL, g=NA_character_, empty=FALSE) do_bedtools_unionbedg(i, header=FALSE, names=NULL, g=NA_character_, empty=FALSE)
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Paths to BAM/BED/GFF/VCF/etc files (vector or comma-separated), or a list of objects. |
header |
Ignored. |
names |
Provide an alias for each to use for each |
g |
A genome file, identifier or Seqinfo object that defines the order and size of the sequences. |
empty |
Report empty regions (i.e., regions not covered in any of the
files). This essentially yields a partitioning of the genome (and
thus requires |
As with all commands, there are three interfaces to the
unionbedg command:
bedtools_unionbedgParses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_unionbedgAccepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_unionbedgEvaluates the result of
R_bedtools_unionbedg. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
This is essentially the same operation as
bedtools_multiinter, except we build a score matrix
and embed it into a SummarizedExperiment. This is a bit tricky and
relies on the as.matrix,AtomicList-method coercion.
A language object containing the compiled R code, evaluating to a RangedSummarizedExperiment with an assay called “score”.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/unionbedg.html
disjoin for forming disjoint ranges, RangedSummarizedExperiment-class for SummarizedExperiment objects.
## Not run: setwd(system.file("unitTests", "data", "unionbedg", package="HelloRanges")) ## End(Not run) ## combine three samples bedtools_unionbedg("-i a.bedGraph,b.bedGraph,c.bedGraph -names A,B,C") ## include empty ranges (filled with NAs) bedtools_unionbedg("-i a.bedGraph,b.bedGraph,c.bedGraph -names A,B,C -empty -g test.genome")## Not run: setwd(system.file("unitTests", "data", "unionbedg", package="HelloRanges")) ## End(Not run) ## combine three samples bedtools_unionbedg("-i a.bedGraph,b.bedGraph,c.bedGraph -names A,B,C") ## include empty ranges (filled with NAs) bedtools_unionbedg("-i a.bedGraph,b.bedGraph,c.bedGraph -names A,B,C -empty -g test.genome")
Computes the mode (and “antimode”) of a distribution. It is not
clear whether this is a generally useful statistic, but
bedtools defined it, so we did for completeness.
distmode(x, anti = FALSE)distmode(x, anti = FALSE)
x |
The vector for which the mode is computed. |
anti |
Whether to return the value with the least representation, instead of the highest. |
There are methods for List subclasses and ordinary vectors/factors. The List methods are useful for aggregation.
The value that is the most (or least) prevalent in the x.
Michael Lawrence
Not to be confused with the data mode of a vector.
bedtools_map for an example in the context of aggregation.
distmode(c(1L, 2L, 1L))distmode(c(1L, 2L, 1L))
Creates a Pairs from two vectors, optionally via a left outer join.
pair(x, y, ...) ## S4 method for signature 'Vector,Vector' pair(x, y, by = findMatches(x, y), all.x = FALSE, NA.VALUE = y[NA])pair(x, y, ...) ## S4 method for signature 'Vector,Vector' pair(x, y, by = findMatches(x, y), all.x = FALSE, NA.VALUE = y[NA])
x |
The “first” vector. |
y |
The “second” vector. |
by |
The |
all.x |
If |
NA.VALUE |
The value to fill holes in |
... |
Arguments for methods. |
This might move to S4Vectors at some point. It is distinct from
simple Pairs construction, because it
performs transformations like a left outer join. More options might be
added in the future.
For GRanges and other ranged objects, pair adds “.” to
the seqlevels, because that is the seqname of the missing GRanges.
A Pairs object
Michael Lawrence
Pairs-class, created by this function.
bedtools_intersect, whose loj argument motivated
the creation of this function.