Title: | megadepth: BigWig and BAM related utilities |
---|---|
Description: | This package provides an R interface to Megadepth by Christopher Wilks available at https://github.com/ChristopherWilks/megadepth. It is particularly useful for computing the coverage of a set of genomic regions across bigWig or BAM files. With this package, you can build base-pair coverage matrices for regions or annotations of your choice from BigWig files. Megadepth was used to create the raw files provided by https://bioconductor.org/packages/recount3. |
Authors: | Leonardo Collado-Torres [aut] , David Zhang [aut, cre] |
Maintainer: | David Zhang <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.17.0 |
Built: | 2024-10-30 08:43:36 UTC |
Source: | https://github.com/bioc/megadepth |
Given an input BAM file, convert this to the BigWig format which can then be
used in get_coverage()
.
bam_to_bigwig( bam_file, prefix = file.path(tempdir(), basename(bam_file)), min_unique_qual = FALSE, double_count = FALSE, overwrite = FALSE )
bam_to_bigwig( bam_file, prefix = file.path(tempdir(), basename(bam_file)), min_unique_qual = FALSE, double_count = FALSE, overwrite = FALSE )
bam_file |
A |
prefix |
A |
min_unique_qual |
A |
double_count |
A |
overwrite |
A |
Note that this functionality is currently not supported on Windows by Megadepth.
A character()
with the path to the output files(s).
## Install the latest version if necessary install_megadepth(force = TRUE) ## Find the example BAM file example_bam <- system.file("tests", "test.bam", package = "megadepth", mustWork = TRUE ) ## Create the BigWig file ## Currently Megadepth does not support this on Windows if (!xfun::is_windows()) { example_bw <- bam_to_bigwig(example_bam, overwrite = TRUE) ## Path to the output file(s) generated by bam_to_bigwig() example_bw ## Use the all.bw file in get_coverage(), first find an annotation file annotation_file <- system.file("tests", "testbw2.bed", package = "megadepth", mustWork = TRUE ) ## Compute the coverage bw_cov <- get_coverage( example_bw["all.bw"], op = "mean", annotation = annotation_file ) bw_cov }
## Install the latest version if necessary install_megadepth(force = TRUE) ## Find the example BAM file example_bam <- system.file("tests", "test.bam", package = "megadepth", mustWork = TRUE ) ## Create the BigWig file ## Currently Megadepth does not support this on Windows if (!xfun::is_windows()) { example_bw <- bam_to_bigwig(example_bam, overwrite = TRUE) ## Path to the output file(s) generated by bam_to_bigwig() example_bw ## Use the all.bw file in get_coverage(), first find an annotation file annotation_file <- system.file("tests", "testbw2.bed", package = "megadepth", mustWork = TRUE ) ## Compute the coverage bw_cov <- get_coverage( example_bw["all.bw"], op = "mean", annotation = annotation_file ) bw_cov }
Given a BAM file, extract junction information including co-ordinates, strand, anchor length for each junction read. For details on the format of the output TSV file, check https://github.com/ChristopherWilks/megadepth#junctions.
bam_to_junctions( bam_file, prefix = file.path(tempdir(), basename(bam_file)), all_junctions = TRUE, junctions = FALSE, long_reads = FALSE, filter_in = 65535, filter_out = 260, overwrite = FALSE )
bam_to_junctions( bam_file, prefix = file.path(tempdir(), basename(bam_file)), all_junctions = TRUE, junctions = FALSE, long_reads = FALSE, filter_in = 65535, filter_out = 260, overwrite = FALSE )
bam_file |
A |
prefix |
A |
all_junctions |
A |
junctions |
A |
long_reads |
A |
filter_in |
A |
filter_out |
A |
overwrite |
A |
A character(1)
with the path to the output junction tsv file.
## Install if necessary install_megadepth() ## Find the example BAM file example_bam <- system.file("tests", "test.bam", package = "megadepth", mustWork = TRUE ) ## Run bam_to_junctions() example_jxs <- bam_to_junctions(example_bam, overwrite = TRUE) ## Path to the output file generated by bam_to_junctions() example_jxs
## Install if necessary install_megadepth() ## Find the example BAM file example_bam <- system.file("tests", "test.bam", package = "megadepth", mustWork = TRUE ) ## Run bam_to_junctions() example_jxs <- bam_to_junctions(example_bam, overwrite = TRUE) ## Path to the output file generated by bam_to_junctions() example_jxs
Given an input set of annotation regions, compute coverage summarizations using Megadepth for a given BigWig file.
get_coverage( bigwig_file, op = c("sum", "mean", "max", "min"), annotation, prefix = file.path(tempdir(), "bw.mean") )
get_coverage( bigwig_file, op = c("sum", "mean", "max", "min"), annotation, prefix = file.path(tempdir(), "bw.mean") )
bigwig_file |
A |
op |
A |
annotation |
A |
prefix |
A |
Note that the chromosome names (seqnames) in the BigWig file and the annotation file should use the same format. Otherwise, Megadepth will return 0 counts.
A GRanges-class object with the coverage summarization across the annotation ranges.
Other Coverage functions:
read_coverage()
## Install if necessary install_megadepth() ## Next, we locate the example BigWig and annotation files example_bw <- system.file("tests", "test.bam.all.bw", package = "megadepth", mustWork = TRUE ) annotation_file <- system.file("tests", "testbw2.bed", package = "megadepth", mustWork = TRUE ) ## Compute the coverage bw_cov <- get_coverage(example_bw, op = "mean", annotation = annotation_file) bw_cov ## If you want to cast this into a RleList object use the following code: ## (it's equivalent to rtracklayer::import.bw(as = "RleList")) ## although in the megadepth case the data has been summarized GenomicRanges::coverage(bw_cov) ## Checking with derfinder and rtracklayer bed <- rtracklayer::import(annotation_file) ## The file needs a name names(example_bw) <- "example" ## Read in the base-pair coverage data if (!xfun::is_windows()) { regionCov <- derfinder::getRegionCoverage( regions = bed, files = example_bw, verbose = FALSE ) ## Summarize the base-pair coverage data. ## Note that we have to round the mean to make them comparable. testthat::expect_equivalent( round(sapply(regionCov[c(1, 3:4, 2)], function(x) mean(x$value)), 2), bw_cov$score, ) ## If we compute the sum, there's no need to round testthat::expect_equivalent( sapply(regionCov[c(1, 3:4, 2)], function(x) sum(x$value)), get_coverage(example_bw, op = "sum", annotation = annotation_file)$score, ) }
## Install if necessary install_megadepth() ## Next, we locate the example BigWig and annotation files example_bw <- system.file("tests", "test.bam.all.bw", package = "megadepth", mustWork = TRUE ) annotation_file <- system.file("tests", "testbw2.bed", package = "megadepth", mustWork = TRUE ) ## Compute the coverage bw_cov <- get_coverage(example_bw, op = "mean", annotation = annotation_file) bw_cov ## If you want to cast this into a RleList object use the following code: ## (it's equivalent to rtracklayer::import.bw(as = "RleList")) ## although in the megadepth case the data has been summarized GenomicRanges::coverage(bw_cov) ## Checking with derfinder and rtracklayer bed <- rtracklayer::import(annotation_file) ## The file needs a name names(example_bw) <- "example" ## Read in the base-pair coverage data if (!xfun::is_windows()) { regionCov <- derfinder::getRegionCoverage( regions = bed, files = example_bw, verbose = FALSE ) ## Summarize the base-pair coverage data. ## Note that we have to round the mean to make them comparable. testthat::expect_equivalent( round(sapply(regionCov[c(1, 3:4, 2)], function(x) mean(x$value)), 2), bw_cov$score, ) ## If we compute the sum, there's no need to round testthat::expect_equivalent( sapply(regionCov[c(1, 3:4, 2)], function(x) sum(x$value)), get_coverage(example_bw, op = "sum", annotation = annotation_file)$score, ) }
Download the appropriate Megadepth executable for your platform from Github
and try to copy it to a system directory so megadepth can run the
megadepth
command.
install_megadepth(version = "latest", force = FALSE)
install_megadepth(version = "latest", force = FALSE)
version |
A |
force |
A |
If this function is run in an non-interactive session such as R CMD
Check
, it will install Megadepth to tempdir()
. If this function is
run interactively, the user will be prompted to agree to allow Megadepth to
be installed at Sys.getenv('APPDATA')
on Windows,
‘~/Library/Application Support’ on macOS, and ‘~/bin/’ on other
platforms (such as Linux). If these directories are not writable, the package
directory ‘Megadepth’ of megadepth will be used. If it still
fails, you have to install Megadepth by yourself and make sure it can be
found via the environment variable PATH
.
If you want to install Megadepth to a custom path, you can set the global
option megadepth.dir
to a directory to store the Megadepth executable
before you call install_megadepth()
, e.g.,
options(megadepth.hugo.dir = '~/Downloads/Megadepth_1.0.4/')
.
This may be useful for you to use a specific
version of Megadepth for a specific project. You can set this option per
project, similar to how blogdown.hugo.dir
is used for specifying
the directory for Hugo in the blogdown package..
See Section
1.4 Global options for details, or store a copy of Megadepth on a USB Flash
drive along with your project code.
Returns NULL
. The main use is to install Megadepth.
This function is based on blogdown::install_hugo() which is available from https://github.com/rstudio/blogdown/blob/master/R/install.R.
## Install megadepth install_megadepth()
## Install megadepth install_megadepth()
Wrapper functions to run Megadepth commands via
system2('megadepth', ...)
.
megadepth_cmd(...) megadepth_shell(input = ".", ...)
megadepth_cmd(...) megadepth_shell(input = ".", ...)
... |
Arguments to be passed to |
input |
A |
See base::system2()
for the types of output you can generate.
A character()
with the capture of the standard output stream
generated by Megadepth.
megadepth_cmd()
: Run an arbitrary Megadepth command.
megadepth_shell()
: Run an arbitrary Megadepth command.
megadepth_cmd()
is based on blogdown::hugo_cmd() which is available at
https://github.com/rstudio/blogdown/blob/master/R/hugo.R.
megadepth_shell()
is based on the shell_ls() example from cmdfun which is
available at https://snystrom.github.io/cmdfun/index.html.
## Install if necessary install_megadepth() ## Find version ## megadepth_shell() provides an interface more familiar to R users megadepth_shell(version = TRUE) ## megadepth_cmd() requires using directly the command line syntax for ## Megadepth megadepth_cmd("--version", stdout = TRUE) ## Compare the help files: # megadepth_shell() captures the standard output and returns a character() # megadepth_cmd() shows the standard output on the console megadepth_shell("--help") megadepth_cmd("--help")
## Install if necessary install_megadepth() ## Find version ## megadepth_shell() provides an interface more familiar to R users megadepth_shell(version = TRUE) ## megadepth_cmd() requires using directly the command line syntax for ## Megadepth megadepth_cmd("--version", stdout = TRUE) ## Compare the help files: # megadepth_shell() captures the standard output and returns a character() # megadepth_cmd() shows the standard output on the console megadepth_shell("--help") megadepth_cmd("--help")
Parses the junctions outputted from process_junction_table()
into an STAR
compatible format (SJ.out) for more convenient use in downstream analyses.
The columns strand, intron_motif and annotated will always be 0 (undefined)
but can be derived through extracting the dinucleotide motifs for the given
reference coordinates for canonical motifs. This function is an
R-implementation of the Megadepth helper script, on which further details of
column definitions can be found:
https://github.com/ChristopherWilks/megadepth#junctions.
process_junction_table(all_jxs)
process_junction_table(all_jxs)
all_jxs |
A |
Processed junctions in a STAR-compatible format.
## Install if necessary install_megadepth() ## Find the example BAM file example_bam <- system.file("tests", "test.bam", package = "megadepth", mustWork = TRUE ) ## Run bam_to_junctions() example_jxs <- bam_to_junctions(example_bam, overwrite = TRUE) ## Read the junctions in as a tibble all_jxs <- read_junction_table(example_jxs[["all_jxs.tsv"]]) ## Process junctions into a STAR-compatible format processed_jxs <- process_junction_table(all_jxs) processed_jxs
## Install if necessary install_megadepth() ## Find the example BAM file example_bam <- system.file("tests", "test.bam", package = "megadepth", mustWork = TRUE ) ## Run bam_to_junctions() example_jxs <- bam_to_junctions(example_bam, overwrite = TRUE) ## Read the junctions in as a tibble all_jxs <- read_junction_table(example_jxs[["all_jxs.tsv"]]) ## Process junctions into a STAR-compatible format processed_jxs <- process_junction_table(all_jxs) processed_jxs
Read an *annotation.tsv
file created by get_coverage()
or manually by
the user using Megadepth.
read_coverage(tsv_file, verbose = TRUE) read_coverage_table(tsv_file)
read_coverage(tsv_file, verbose = TRUE) read_coverage_table(tsv_file)
tsv_file |
A |
verbose |
A |
A GRanges-class object with the coverage summarization across the annotation ranges.
A tibble::tibble()
with columns chr
, start
, end
and score
.
read_coverage_table()
: Read a coverage TSV file created by Megadepth as
a table
Other Coverage functions:
get_coverage()
## Install if necessary install_megadepth() ## Locate example BigWig and annotation files example_bw <- system.file("tests", "test.bam.all.bw", package = "megadepth", mustWork = TRUE ) annotation_file <- system.file("tests", "testbw2.bed", package = "megadepth", mustWork = TRUE ) ## Compute the coverage bw_cov <- get_coverage(example_bw, op = "mean", annotation = annotation_file) bw_cov ## Read in the coverage file again, using read_coverage() ## First, lets locate the tsv file that was generated by get_coverage() tsv_file <- file.path(tempdir(), "bw.mean.annotation.tsv") bw_cov_manual <- read_coverage(tsv_file) stopifnot(identical(bw_cov, bw_cov_manual)) ## To get an RleList object, just like the one you would get ## from using rtracklayer::import.bw(as = "RleList") directly on the ## BigWig file, use: GenomicRanges::coverage(bw_cov_manual) ## The coverage data can also be read as a `tibble::tibble()` read_coverage_table(tsv_file)
## Install if necessary install_megadepth() ## Locate example BigWig and annotation files example_bw <- system.file("tests", "test.bam.all.bw", package = "megadepth", mustWork = TRUE ) annotation_file <- system.file("tests", "testbw2.bed", package = "megadepth", mustWork = TRUE ) ## Compute the coverage bw_cov <- get_coverage(example_bw, op = "mean", annotation = annotation_file) bw_cov ## Read in the coverage file again, using read_coverage() ## First, lets locate the tsv file that was generated by get_coverage() tsv_file <- file.path(tempdir(), "bw.mean.annotation.tsv") bw_cov_manual <- read_coverage(tsv_file) stopifnot(identical(bw_cov, bw_cov_manual)) ## To get an RleList object, just like the one you would get ## from using rtracklayer::import.bw(as = "RleList") directly on the ## BigWig file, use: GenomicRanges::coverage(bw_cov_manual) ## The coverage data can also be read as a `tibble::tibble()` read_coverage_table(tsv_file)
Read an *all_jxs.tsv
or *jxs.tsv
file created by bam_to_junctions()
or
manually by the user using Megadepth. The rows of a *jxs.tsv
can have
either 7 or 14 columns, which can lead to warnings when reading in - these
are safe to ignore. For details on the format of the input TSV file, check
https://github.com/ChristopherWilks/megadepth#junctions.
read_junction_table(tsv_file)
read_junction_table(tsv_file)
tsv_file |
A |
A tibble::tibble()
with the junction data that follows the format
specified at https://github.com/ChristopherWilks/megadepth#junctions.
## Install if necessary install_megadepth() ## Find the example BAM file example_bam <- system.file("tests", "test.bam", package = "megadepth", mustWork = TRUE ) ## Run bam_to_junctions() example_jxs <- bam_to_junctions(example_bam, overwrite = TRUE) ## Read the junctions in as a tibble all_jxs <- read_junction_table(example_jxs[["all_jxs.tsv"]]) all_jxs
## Install if necessary install_megadepth() ## Find the example BAM file example_bam <- system.file("tests", "test.bam", package = "megadepth", mustWork = TRUE ) ## Run bam_to_junctions() example_jxs <- bam_to_junctions(example_bam, overwrite = TRUE) ## Read the junctions in as a tibble all_jxs <- read_junction_table(example_jxs[["all_jxs.tsv"]]) all_jxs