Title: | Comparison, Benchmarking & QC of Epigenomic Datasets |
---|---|
Description: | EpiCompare is used to compare and analyse epigenetic datasets for quality control and benchmarking purposes. The package outputs an HTML report consisting of three sections: (1. General metrics) Metrics on peaks (percentage of blacklisted and non-standard peaks, and peak widths) and fragments (duplication rate) of samples, (2. Peak overlap) Percentage and statistical significance of overlapping and non-overlapping peaks. Also includes upset plot and (3. Functional annotation) functional annotation (ChromHMM, ChIPseeker and enrichment analysis) of peaks. Also includes peak enrichment around TSS. |
Authors: | Sera Choi [aut] , Brian Schilder [aut] , Leyla Abbasova [aut], Alan Murphy [aut] , Nathan Skene [aut] , Thomas Roberts [ctb], Hiranyamaya Dash [cre] |
Maintainer: | Hiranyamaya Dash <[email protected]> |
License: | GPL-3 |
Version: | 1.11.0 |
Built: | 2024-10-30 07:23:53 UTC |
Source: | https://github.com/bioc/EpiCompare |
Wrapper function for bplapply that automatically handles issues with BiocParallel related to different OS platforms.
bpplapply( X, FUN, apply_fun = parallel::mclapply, workers = check_workers(), progressbar = workers > 1, verbose = workers == 1, use_snowparam = TRUE, register_now = FALSE, ... )
bpplapply( X, FUN, apply_fun = parallel::mclapply, workers = check_workers(), progressbar = workers > 1, verbose = workers == 1, use_snowparam = TRUE, register_now = FALSE, ... )
X |
Any object for which methods |
FUN |
The |
apply_fun |
Iterator function to use. |
workers |
Number of threads to parallelize across. |
progressbar |
|
verbose |
Print messages. |
use_snowparam |
Whether to use
SnowParam (default: |
register_now |
Register the cores now with
register ( |
... |
Arguments passed on to
|
(Named) list.
X <- stats::setNames(seq_len(length(letters)), letters) out <- bpplapply(X, print)
X <- stats::setNames(seq_len(length(letters)), letters) out <- bpplapply(X, print)
Assign parallel worker cores.
check_workers(workers = NULL)
check_workers(workers = NULL)
workers |
Number of cores to parallelise across
(in applicable functions).
If |
Integer
workers <- check_workers()
workers <- check_workers()
Human H3K27ac peak file generated with CUT&Run using K562 cell-line from Meers et al., (2019). Human genome build hg19 was used. Raw peak file (.BED) was obtained from GEO (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8581604). Peak calling was performed by Leyla Abbasova using MACS2. The peak file was then processed into GRanges object. Peaks located on chromosome 1 were subsetted to reduce the dataset size.
data("CnR_H3K27ac")
data("CnR_H3K27ac")
An object of class GRanges
of length 2707.
The code to prepare the .Rda file from the raw peak file is:
# sequences were directly downloaded from
https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8581604
# and peaks (BED file) were generated by Leyla Abbasova
(Neurogenomics Lab, Imperial College London)
CnR_H3K27ac <- ChIPseeker::readPeakFile("path", as = "GRanges")
CnR_H3K27ac <- CnR_H3K27ac[seqnames(CnR_H3K27ac)== "chr1"]
my_label <-
c("name","score","strand","signalValue","pValue","qValue","peak")
colnames(GenomicRanges::mcols(CnR_H3K27ac)) <- my_label
usethis::use_data(CnR_H3K27ac, overwrite = TRUE)
Duplication metrics output on CUT&Run H3K27ac file (sample accession: SRR8581604). Raw sequences were aligned to hg19 genome and after, Picard was performed by Leyla Abbasova. The duplication summary output generated by Picard was processed to reduce the size of data.
data("CnR_H3K27ac_picard")
data("CnR_H3K27ac_picard")
An object of class data.frame
with 1 rows and 10 columns.
The code to prepare the .Rda file is:
picard <- read.table("path/to/picard/duplication/output",
header = TRUE, fill = TRUE)
CnR_H3K27ac_picard <- picard[1,]
usethis::use_data(CnR_H3K27ac_picard, overwrite = TRUE)
Human H3K27ac peak file generated with CUT&Tag using K562 cell-line from Kaya-Okur et al., (2019). Human genome build hg19 was used. Raw peak file (.BED) was obtained from GEO (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8383507). Peak calling was performed by Leyla Abbasova using MACS2. The peak file was then imported as an GRanges object. Peaks located on chromosome 1 were subsetted to reduce the dataset size.
data("CnT_H3K27ac")
data("CnT_H3K27ac")
An object of class GRanges
of length 1670.
The code to prepare the .Rda file from the raw peak file is:
# sequences were directly downloaded from
https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8383507
# and peaks (BED file) were generated by Leyla Abbasova
(Neurogenomics Lab, Imperial College London)
CnT_H3K27ac <- ChIPseeker::readPeakFile("path", as = "GRanges")
CnT_H3K27ac <- CnT_H3K27ac[seqnames(CnT_H3K27ac)== "chr1"]
my_label <-
c("name","score","strand","signalValue","pValue","qValue","peak")
colnames(GenomicRanges::mcols(CnT_H3K27ac)) <- my_label
usethis::use_data(CnT_H3K27ac)
Duplication metrics output of CUT&Tag H3K27ac file (sample accession: SRR8581604). Raw sequences were aligned to hg19 genome and Picard was performed by Leyla Abbasova. The duplication summary output generated by Picard was processed to reduce the size of data.
data("CnT_H3K27ac_picard")
data("CnT_H3K27ac_picard")
An object of class data.frame
with 1 rows and 10 columns.
The code to prepare the .Rda file is:
picard <- read.table("path/to/picard/duplication/output",
header = TRUE, fill = TRUE)]
CnT_H3K27ac_picard <- picard[1,]
usethis::use_data(CnT_H3K27ac_picard, overwrite = TRUE)
Compute consensus peaks from a list of GRanges.
compute_consensus_peaks( grlist, groups = NULL, genome_build, lower = 2, upper = Inf, min.gapwidth = 1L, method = c("granges", "consensusseeker"), ... )
compute_consensus_peaks( grlist, groups = NULL, genome_build, lower = 2, upper = Inf, min.gapwidth = 1L, method = c("granges", "consensusseeker"), ... )
grlist |
Named list of GRanges objects. |
groups |
A character vector of the same length as |
genome_build |
Genome build name. |
lower , upper
|
The lower and upper bounds for the slice. |
min.gapwidth |
Ranges separated by a gap of at least |
method |
Method to call peaks with:
|
... |
Arguments passed on to
|
NOTE: If you get the error
"Error in serialize(data, node$con) : error writing to connection"
,
try running closeAllConnections
and rerun compute_consensus_peaks.
This error can sometimes occur when
compute_consensus_peaks
has been disrupted partway through.
Named list of consensus peak GRanges.
data("encode_H3K27ac") # example dataset as GRanges object data("CnT_H3K27ac") # example dataset as GRanges object data("CnR_H3K27ac") # example dataset as GRanges object grlist <- list(CnR=CnR_H3K27ac, CnT=CnT_H3K27ac, ENCODE=encode_H3K27ac) consensus_peaks <- compute_consensus_peaks(grlist = grlist, groups = c("Imperial", "Imperial", "ENCODE"))
data("encode_H3K27ac") # example dataset as GRanges object data("CnT_H3K27ac") # example dataset as GRanges object data("CnR_H3K27ac") # example dataset as GRanges object grlist <- list(CnR=CnR_H3K27ac, CnT=CnT_H3K27ac, ENCODE=encode_H3K27ac) consensus_peaks <- compute_consensus_peaks(grlist = grlist, groups = c("Imperial", "Imperial", "ENCODE"))
Compute correlation matrix on all peak files.
compute_corr( peakfiles, reference = NULL, genome_build, keep_chr = NULL, drop_empty_chr = FALSE, bin_size = 5000, method = "spearman", intensity_cols = c("total_signal", "qValue", "Peak Score", "score"), return_bins = FALSE, fill_diag = NA, workers = check_workers(), save_path = tempfile(fileext = ".corr.csv.gz") )
compute_corr( peakfiles, reference = NULL, genome_build, keep_chr = NULL, drop_empty_chr = FALSE, bin_size = 5000, method = "spearman", intensity_cols = c("total_signal", "qValue", "Peak Score", "score"), return_bins = FALSE, fill_diag = NA, workers = check_workers(), save_path = tempfile(fileext = ".corr.csv.gz") )
peakfiles |
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using |
reference |
A named list containing reference peak file(s) as GRanges
object. Please ensure that the reference file is listed and named
i.e. |
genome_build |
The build of **all** peak and reference files to calculate the correlation matrix on. If all peak and reference files are not of the same build use liftover_grlist to convert them all before running. Genome build should be one of hg19, hg38, mm9, mm10. |
keep_chr |
Which chromosomes to keep. |
drop_empty_chr |
Drop chromosomes that are not present in any of the
|
bin_size |
Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage. |
method |
Default spearman (i.e. non-parametric). A character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson", "kendall", or "spearman": can be abbreviated. |
intensity_cols |
Depending on which columns are present, this value will be used to get quantiles and ultimately calculate the correlations:
|
return_bins |
If |
fill_diag |
Fill the diagonal of the overlap matrix. |
workers |
Number of threads to parallelize across. |
save_path |
Path to save a table of correlation results to. |
correlation matrix
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac"=encode_H3K27ac) #increasing bin_size for speed but lower values will give more granular corr corr_mat <- compute_corr(peakfiles = peakfiles, reference = reference, genome_build = "hg19", bin_size = 200000, workers = 1)
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac"=encode_H3K27ac) #increasing bin_size for speed but lower values will give more granular corr corr_mat <- compute_corr(peakfiles = peakfiles, reference = reference, genome_build = "hg19", bin_size = 200000, workers = 1)
Save an object as RDS and create a download button that can be rendered to Rmarkdown HTML pages. Uses the package downloadthis.
download_button( object, save_output = FALSE, outfile_dir = NULL, filename = NULL, button_label = paste0("Download: ", "<code>", filename, "</code>"), output_extension = ".rds", icon = "fa fa-save", button_type = "success", self_contained = TRUE, add_download_button = TRUE, verbose = TRUE )
download_button( object, save_output = FALSE, outfile_dir = NULL, filename = NULL, button_label = paste0("Download: ", "<code>", filename, "</code>"), output_extension = ".rds", icon = "fa fa-save", button_type = "success", self_contained = TRUE, add_download_button = TRUE, verbose = TRUE )
object |
R object to serialize. |
save_output |
Default FALSE. If TRUE, all outputs (tables and plots) of the analysis will be saved in a folder (EpiCompare_file). |
outfile_dir |
Directory to save the file to. |
filename |
Name of the file to save. |
button_label |
Character (HTML), button label |
output_extension |
Extension of the output file. Currently, |
icon |
Fontawesome tag e.g.: "fa fa-save" |
button_type |
Character, one of the standard Bootstrap types |
self_contained |
A boolean to specify whether your HTML output is
self-contained. Default to |
add_download_button |
Add download buttons for each plot or dataset. |
verbose |
Print messages. |
Download button as HTML text.
button <- download_button(object=mtcars)
button <- download_button(object=mtcars)
Human H3K27ac peak file generated with ChIP-seq using K562 cell-line. Human genome build hg19 was used. The peak file (.BED) was obtained from ENCODE project (https://www.encodeproject.org/files/ENCFF044JNJ/). The BED file was then imported as an GRanges object. Peaks located on chromosome 1 were subsetted to reduce the dataset size.
data("encode_H3K27ac")
data("encode_H3K27ac")
An object of class GRanges
of length 5142.
The code to prepare the .Rda file from the raw peak file is:
# dataset was directly downloaded from
# https://www.encodeproject.org/files/ENCFF044JNJ/
encode_H3K27ac <- ChIPseeker::readPeakFile("path", as = "GRanges")
encode_H3K27ac <-
encode_H3K27ac[seqnames(encode_H3K27ac) == "chr1"]
my_label <-
c("name","score","strand","signalValue","pValue","qValue","peak")
colnames(GenomicRanges::mcols(encode_H3K27ac)) <- my_label
usethis::use_data(encode_H3K27ac, overwrite = TRUE)
This function compares and analyses multiple epigenomic datasets and outputs an HTML report containing all results of the analysis. The report is mainly divided into three sections: (1) General Metrics on Peakfiles, (2) Peak Overlaps and (3) Functional Annotation of Peaks.
EpiCompare( peakfiles, genome_build, genome_build_output = "hg19", blacklist = NULL, picard_files = NULL, reference = NULL, upset_plot = FALSE, stat_plot = FALSE, chromHMM_plot = FALSE, chromHMM_annotation = "K562", chipseeker_plot = FALSE, enrichment_plot = FALSE, tss_plot = FALSE, tss_distance = c(-3000, 3000), precision_recall_plot = FALSE, n_threshold = 20, corr_plot = FALSE, bin_size = 5000, interact = TRUE, add_download_button = FALSE, save_output = FALSE, output_filename = "EpiCompare", output_timestamp = FALSE, output_dir, display = NULL, run_all = FALSE, workers = 1, quiet = FALSE, error = FALSE, debug = FALSE )
EpiCompare( peakfiles, genome_build, genome_build_output = "hg19", blacklist = NULL, picard_files = NULL, reference = NULL, upset_plot = FALSE, stat_plot = FALSE, chromHMM_plot = FALSE, chromHMM_annotation = "K562", chipseeker_plot = FALSE, enrichment_plot = FALSE, tss_plot = FALSE, tss_distance = c(-3000, 3000), precision_recall_plot = FALSE, n_threshold = 20, corr_plot = FALSE, bin_size = 5000, interact = TRUE, add_download_button = FALSE, save_output = FALSE, output_filename = "EpiCompare", output_timestamp = FALSE, output_dir, display = NULL, run_all = FALSE, workers = 1, quiet = FALSE, error = FALSE, debug = FALSE )
peakfiles |
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using |
genome_build |
A named list indicating the human genome build used to generate each of the following inputs:
Example input list: |
genome_build_output |
Genome build to standardise all inputs to. Liftovers will be performed automatically as needed. Default: "hg19". |
blacklist |
A GRanges object containing blacklisted genomic regions. Blacklists included in EpiCompare are:
|
picard_files |
A list of summary metrics output from Picard.
Files must be in data.frame format and listed using |
reference |
A named list containing reference peak file(s) as GRanges
object. Please ensure that the reference file is listed and named
i.e. |
upset_plot |
Default FALSE. If TRUE, the report includes upset plot of overlapping peaks. |
stat_plot |
Default FALSE. If TRUE, the function creates a plot showing the statistical significance of overlapping/non-overlapping peaks. Reference peak file must be provided. |
chromHMM_plot |
Default FALSE. If TRUE, the function outputs ChromHMM heatmap of individual peak files. If a reference peak file is provided, ChromHMM annotation of overlapping and non-overlapping peaks is also provided. |
chromHMM_annotation |
ChromHMM annotation for ChromHMM plots. Default K562 cell-line. Cell-line options are:
|
chipseeker_plot |
Default FALSE. If TRUE, the report includes a barplot of ChIPseeker annotation of peak files. |
enrichment_plot |
Default FALSE. If TRUE, the report includes dotplots of KEGG and GO enrichment analysis of peak files. |
tss_plot |
Default FALSE. If TRUE, the report includes peak count frequency around transcriptional start site. Note that this can take awhile. |
tss_distance |
A vector specifying the distance upstream and downstream
around transcription start sites (TSS).
The default value is |
precision_recall_plot |
Default is FALSE. If TRUE, creates a precision-recall curve plot and an F1 plot using plot_precision_recall. |
n_threshold |
Number of thresholds to test. |
corr_plot |
Default is FALSE. If TRUE, creates a correlation plot across all peak files using plot_corr. |
bin_size |
Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage. |
interact |
Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static. |
add_download_button |
Add download buttons for each plot or dataset. |
save_output |
Default FALSE. If TRUE, all outputs (tables and plots) of the analysis will be saved in a folder (EpiCompare_file). |
output_filename |
Default EpiCompare.html. If otherwise, the html report will be saved in the specified name. |
output_timestamp |
Default FALSE. If TRUE, date will be included in the file name. |
output_dir |
Path to where output HTML file should be saved. |
display |
After completion, automatically display the HTML report file in one of the following ways:
|
run_all |
Convenience argument that enables all plots/features
(without specifying each argument manually)
by overriding the default values.
Default: |
workers |
Number of threads to parallelize across. |
quiet |
An option to suppress printing during rendering from knitr,
pandoc command line and others. To only suppress printing of the last
"Output created: " message, you can set |
error |
If |
debug |
Run in debug mode, where are messages and warnings
are printed within the HTML report (default: |
Path to one or more HTML report files.
### Load Data ### data("encode_H3K27ac") # example dataset as GRanges object data("CnT_H3K27ac") # example dataset as GRanges object data("CnR_H3K27ac") # example dataset as GRanges object data("CnT_H3K27ac_picard") # example Picard summary output data("CnR_H3K27ac_picard") # example Picard summary output #### Prepare Input #### # create named list of peakfiles peakfiles <- list(CnR=CnR_H3K27ac, CnT=CnT_H3K27ac) # create named list of picard outputs picard_files <- list(CnR=CnR_H3K27ac_picard, CnT=CnT_H3K27ac_picard) # reference peak file reference <- list("ENCODE" = encode_H3K27ac) ### Run EpiCompare ### output_html <- EpiCompare(peakfiles = peakfiles, genome_build = list(peakfiles="hg19", reference="hg19"), picard_files = picard_files, reference = reference, output_filename = "EpiCompare_test", output_dir = tempdir()) # utils::browseURL(output_html)
### Load Data ### data("encode_H3K27ac") # example dataset as GRanges object data("CnT_H3K27ac") # example dataset as GRanges object data("CnR_H3K27ac") # example dataset as GRanges object data("CnT_H3K27ac_picard") # example Picard summary output data("CnR_H3K27ac_picard") # example Picard summary output #### Prepare Input #### # create named list of peakfiles peakfiles <- list(CnR=CnR_H3K27ac, CnT=CnT_H3K27ac) # create named list of picard outputs picard_files <- list(CnR=CnR_H3K27ac_picard, CnT=CnT_H3K27ac_picard) # reference peak file reference <- list("ENCODE" = encode_H3K27ac) ### Run EpiCompare ### output_html <- EpiCompare(peakfiles = peakfiles, genome_build = list(peakfiles="hg19", reference="hg19"), picard_files = picard_files, reference = reference, output_filename = "EpiCompare_test", output_dir = tempdir()) # utils::browseURL(output_html)
This function outputs a summary on fragments using metrics generated by Picard. Provides the number of mapped fragments, duplication rate and number of unique fragments.
fragment_info(picard_list)
fragment_info(picard_list)
picard_list |
Named list of duplication metrics generated by Picard
as data frame. Data frames must be named and listed using |
A table summarizing metrics on fragments.
### Load Data ### data(CnT_H3K27ac_picard) # example picard output data(CnR_H3K27ac_picard) # example picard output ### Import Picard Metrics ### # To import Picard duplication metrics (.txt file) into R as data frame # CnT_H3K27ac_picard <- read.table("/path/to/picard/output.txt", # header = TRUE,fill = TRUE) ### Create Named List ### picard_list <- list("CnT_H3K27ac"=CnT_H3K27ac_picard, "CnR_H3K27ac"=CnR_H3K27ac_picard) df <- fragment_info(picard_list = picard_list)
### Load Data ### data(CnT_H3K27ac_picard) # example picard output data(CnR_H3K27ac_picard) # example picard output ### Import Picard Metrics ### # To import Picard duplication metrics (.txt file) into R as data frame # CnT_H3K27ac_picard <- read.table("/path/to/picard/output.txt", # header = TRUE,fill = TRUE) ### Create Named List ### picard_list <- list("CnT_H3K27ac"=CnT_H3K27ac_picard, "CnR_H3K27ac"=CnR_H3K27ac_picard) df <- fragment_info(picard_list = picard_list)
Recursively find peak/picard files stored within subdirectories and import them as a list of GRanges objects.
gather_files( dir, type = "peaks.stringent", nfcore_cutandrun = FALSE, return_paths = FALSE, rbind_list = FALSE, workers = check_workers(), verbose = TRUE )
gather_files( dir, type = "peaks.stringent", nfcore_cutandrun = FALSE, return_paths = FALSE, rbind_list = FALSE, workers = check_workers(), verbose = TRUE )
dir |
Directory to search within. |
type |
File type to search for. Options include:
|
nfcore_cutandrun |
Whether the files were generated by the
nf-core/cutandrun Nextflow pipeline.
If |
return_paths |
Return only the file paths without actually reading them in as GRanges. |
rbind_list |
Bind all objects into one. |
workers |
Number of cores to parallelise across
(in applicable functions).
If |
verbose |
Print messages. |
For "peaks.stringent" files called with SEACR, column names will be automatically added:
total_signal : Total signal contained within denoted coordinates.
max_signalMaximum bedgraph signal attained at any base pair within denoted coordinates.
max_signal_region Region representing the farthest upstream and farthest downstream bases within the denoted coordinates that are represented by the maximum bedgraph signal.
A named list of GRanges objects.
#### Make example files #### save_paths <- EpiCompare::write_example_peaks() dir <- unique(dirname(save_paths)) #### Gather/import files #### peaks <- EpiCompare::gather_files(dir=dir, type="peaks.narrow", workers = 1)
#### Make example files #### save_paths <- EpiCompare::write_example_peaks() dir <- unique(dirname(save_paths)) #### Gather/import files #### peaks <- EpiCompare::gather_files(dir=dir, type="peaks.narrow", workers = 1)
Assign group names to each file in a named list based on a series of string searches based on combinations of relevant metadata factors.
group_files(peakfiles, searches)
group_files(peakfiles, searches)
peakfiles |
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using |
searches |
A named list of substrings to group |
Named peak files
data("encode_H3K27ac") # example dataset as GRanges object data("CnT_H3K27ac") # example dataset as GRanges object data("CnR_H3K27ac") # example dataset as GRanges object peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac, encode_H3K27ac=encode_H3K27ac) peaks_grouped <- group_files(peakfiles = peakfiles, searches=list(assay=c("H3K27ac"), source=c("Cn","ENCODE")))
data("encode_H3K27ac") # example dataset as GRanges object data("CnT_H3K27ac") # example dataset as GRanges object data("CnR_H3K27ac") # example dataset as GRanges object peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac, encode_H3K27ac=encode_H3K27ac) peaks_grouped <- group_files(peakfiles = peakfiles, searches=list(assay=c("H3K27ac"), source=c("Cn","ENCODE")))
Obtained from https://www.encodeproject.org/files/ENCFF001TDO/. The ENCODE blacklist includes regions of the hg19 genome that have anomalous and/or unstructured signals independent of the cell-line or experiment. Removal of ENCODE blacklist is recommended for quality measure.
data("hg19_blacklist")
data("hg19_blacklist")
An object of class GRanges
of length 411.
The code to prepare the .Rda file is:
# blacklisted regions were directly downloaded
# from https://www.encodeproject.org/files/ENCFF001TDO/
hg19_blacklist <-
ChIPseeker::readPeakFile(file.path(path), as = "GRanges")
usethis::use_data(hg19_blacklist, overwrite = TRUE)
Obtained from https://www.encodeproject.org/files/ENCFF356LFX/. The ENCODE blacklist includes regions of the hg38 genome that have anomalous and/or unstructured signals independent of the cell-line or experiment. Removal of ENCODE blacklist is recommended for quality measure.
data("hg38_blacklist")
data("hg38_blacklist")
An object of class GRanges
of length 910.
The code to prepare the .Rda file is:
## blacklisted regions were directly downloaded
## from https://www.encodeproject.org/files/ENCFF356LFX/
hg38_blacklist <-
ChIPseeker::readPeakFile(file.path(path), as = "GRanges")
usethis::use_data(hg38_blacklist, overwrite = TRUE)
Perform genome build liftover to one or more GRanges objects at once.
liftover_grlist( grlist, input_build, output_build = "hg19", style = "UCSC", keep_chr = paste0("chr", c(seq_len(22), "X", "Y")), as_grangeslist = FALSE, merge_all = FALSE, verbose = TRUE )
liftover_grlist( grlist, input_build, output_build = "hg19", style = "UCSC", keep_chr = paste0("chr", c(seq_len(22), "X", "Y")), as_grangeslist = FALSE, merge_all = FALSE, verbose = TRUE )
grlist |
A named list of GRanges objects, or simply a single unlisted GRanges object. Can perform liftover within species or across species. |
input_build |
The genome build of |
output_build |
Desired genome build for
|
style |
Chromosome style, set by seqlevelsStyle.
|
keep_chr |
Which chromosomes to keep. |
as_grangeslist |
Return as a GRangesList. |
merge_all |
|
verbose |
Print messages. |
Named list of lifted GRanges objects.
grlist <- list("gr1"=GenomicRanges::GRanges("4:1-100000"), "gr2"=GenomicRanges::GRanges("6:1-100000"), "gr3"=GenomicRanges::GRanges("8:1-100000")) grlist_lifted <- liftover_grlist(grlist = grlist, input_build = "hg19", output_build="hg38")
grlist <- list("gr1"=GenomicRanges::GRanges("4:1-100000"), "gr2"=GenomicRanges::GRanges("6:1-100000"), "gr3"=GenomicRanges::GRanges("8:1-100000")) grlist_lifted <- liftover_grlist(grlist = grlist, input_build = "hg19", output_build="hg38")
Obtained from https://www.encodeproject.org/files/ENCFF547MET/. The ENCODE blacklist includes regions of the mm10 genome that have anomalous and/or unstructured signals independent of the cell-line or experiment. Removal of ENCODE blacklist is recommended for quality measure.
data("mm10_blacklist")
data("mm10_blacklist")
An object of class GRanges
of length 164.
The code to prepare the .Rda file is:
## blacklisted regions were directly downloaded
## from https://www.encodeproject.org/files/ENCFF547MET/
mm10_blacklist <-
ChIPseeker::readPeakFile(file.path(path), as = "GRanges")
usethis::use_data(mm10_blacklist, overwrite = TRUE)
Blaklisted regions of the mm9 genome build
btained by lifting over the mm10_blacklist
.
data("mm9_blacklist")
data("mm9_blacklist")
An object of class GRanges
of length 292.
tmp <- base::get("mm10_blacklist", asNamespace("EpiCompare"))
mm9_blacklist <- liftover_grlist(grlist = tmp,
input_build = "mm10",
output_build = "mm9",
keep_chr = NULL)
usethis::use_data(mm9_blacklist, overwrite = TRUE)
This function generates a heatmap showing percentage of overlapping peaks between peak files.
overlap_heatmap( peaklist, interact = TRUE, draw_cellnote = TRUE, fill_diag = NA, verbose = TRUE )
overlap_heatmap( peaklist, interact = TRUE, draw_cellnote = TRUE, fill_diag = NA, verbose = TRUE )
peaklist |
A list of peak files as GRanges object.
Files must be listed and named using |
interact |
Default TRUE. By default heatmap is interactive. If FALSE, heatmap is static. |
draw_cellnote |
Draw the numeric values within each heatmap cell. |
fill_diag |
Fill the diagonal of the overlap matrix. |
verbose |
Print messages. |
An interactive heatmap
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object ### Create Named List ### peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) ### Run ### my_heatmap <- overlap_heatmap(peaklist = peaklist)
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object ### Create Named List ### peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) ### Run ### my_heatmap <- overlap_heatmap(peaklist = peaklist)
This function calculates the percentage of overlapping peaks and outputs a table or matrix of results.
overlap_percent( peaklist1, peaklist2, invert = FALSE, precision_recall = TRUE, suppress_messages = TRUE )
overlap_percent( peaklist1, peaklist2, invert = FALSE, precision_recall = TRUE, suppress_messages = TRUE )
peaklist1 |
A list of peak files as GRanges object.
Files must be listed and named using |
peaklist2 |
peaklist1 A list of peak files as GRanges object.
Files must be listed and named using |
invert |
If |
precision_recall |
Return percision-recall results for all combinations
of |
suppress_messages |
Suppress messages. |
data frame
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object data("CnR_H3K27ac") # example peakfile GRanges object ### Create Named Peaklist ### peaks <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) reference_peak <- list("ENCODE"=encode_H3K27ac) ### Run ### overlap <- overlap_percent(peaklist1=peaks, peaklist2=reference_peak)
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object data("CnR_H3K27ac") # example peakfile GRanges object ### Create Named Peaklist ### peaks <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) reference_peak <- list("ENCODE"=encode_H3K27ac) ### Run ### overlap <- overlap_percent(peaklist1=peaks, peaklist2=reference_peak)
This function calculates the statistical significance of overlapping/ non-overlapping peaks against a reference peak file. If the reference peak file has the BED6+4 format (peak called by MACS2), the function generates a series of box plots showing the distribution of q-values for sample peaks that are overlapping and non-overlapping with the reference. If the reference peak file does not have the BED6+4 format, the function uses enrichPeakOverlap from ChIPseeker package to calculate the statistical significance of overlapping peaks only. In this case, please provide an annotation file as a TxDb object.
overlap_stat_plot( reference, peaklist, txdb = NULL, interact = FALSE, nShuffle = 50, digits = 4, workers = check_workers() )
overlap_stat_plot( reference, peaklist, txdb = NULL, interact = FALSE, nShuffle = 50, digits = 4, workers = check_workers() )
reference |
A reference peak file as GRanges object. |
peaklist |
A list of peak files as GRanges object.
Files must be listed and named using |
txdb |
A TxDb annotation object from Bioconductor. This is required only if the reference file does not have BED6+4 format. |
interact |
Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static. |
nShuffle |
shuffle numbers |
digits |
integer indicating the number of decimal places
( |
workers |
Number of threads to parallelize across. |
A named list.
"plot"boxplot/barplot showing the statistical significance of overlapping/non-overlapping peaks.
"data"Plot data.
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object data("CnR_H3K27ac") # example peakfile GRanges object ### Create Named Peaklist & Reference ### peaklist <- list('CnT'=CnT_H3K27ac, "CnR"=CnR_H3K27ac) reference <- list("ENCODE"=encode_H3K27ac) out <- overlap_stat_plot(reference = reference, peaklist = peaklist, workers = 1)
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object data("CnR_H3K27ac") # example peakfile GRanges object ### Create Named Peaklist & Reference ### peaklist <- list('CnT'=CnT_H3K27ac, "CnR"=CnR_H3K27ac) reference <- list("ENCODE"=encode_H3K27ac) out <- overlap_stat_plot(reference = reference, peaklist = peaklist, workers = 1)
This function generates upset plot of overlapping peaks files using the ComplexUpset package.
overlap_upset_plot(peaklist, verbose = TRUE)
overlap_upset_plot(peaklist, verbose = TRUE)
peaklist |
A named list of peak files as GRanges object.
Objects must be listed and named using |
verbose |
Print messages |
Upset plot of overlapping peaks.
### Load Data ### data("encode_H3K27ac") # load example data data("CnT_H3K27ac") # load example data peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) my_plot <- overlap_upset_plot(peaklist = peaklist)
### Load Data ### data("encode_H3K27ac") # load example data data("CnT_H3K27ac") # load example data peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) my_plot <- overlap_upset_plot(peaklist = peaklist)
This function outputs a table summarizing information on the peak files. Provides the total number of peaks and the percentage of peaks in blacklisted regions.
peak_info(peaklist, blacklist)
peak_info(peaklist, blacklist)
peaklist |
A named list of peak files as GRanges object.
Objects listed using |
blacklist |
A GRanges object containing blacklisted regions. |
A summary table of peak information
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object data("hg19_blacklist") # example blacklist GRanges object ### Named Peaklist ### peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) ### Run ### df <- peak_info(peaklist = peaklist, blacklist = hg19_blacklist)
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object data("hg19_blacklist") # example blacklist GRanges object ### Named Peaklist ### peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) ### Run ### df <- peak_info(peaklist = peaklist, blacklist = hg19_blacklist)
This function annotates peaks using ChIPseeker::annotatePeak
.
It outputs functional annotation of each peak file in a barplot.
plot_ChIPseeker_annotation( peaklist, txdb = NULL, tss_distance = c(-3000, 3000), interact = FALSE )
plot_ChIPseeker_annotation( peaklist, txdb = NULL, tss_distance = c(-3000, 3000), interact = FALSE )
peaklist |
A list of peak files as GRanges object.
Files must be listed and named using |
txdb |
A TxDb annotation object from Bioconductor. |
tss_distance |
A vector specifying the distance upstream and downstream
around transcription start sites (TSS).
The default value is |
interact |
Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static. |
ggplot barplot
### Load Data ### data("CnT_H3K27ac") # example peakfile GRanges object data("CnR_H3K27ac") # example peakfile GRanges object peaklist <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) my_plot <- plot_ChIPseeker_annotation(peaklist = peaklist, tss_distance = c(-50,50))
### Load Data ### data("CnT_H3K27ac") # example peakfile GRanges object data("CnR_H3K27ac") # example peakfile GRanges object peaklist <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) my_plot <- plot_ChIPseeker_annotation(peaklist = peaklist, tss_distance = c(-50,50))
Creates a heatmap using outputs from ChromHMM using ggplot2.The function
takes a list of peakfiles, performs ChromHMM and outputs a heatmap. ChromHMM
annotation file must be loaded prior to using this function.
ChromHMM annotations are aligned to hg19, and will be automatically lifted
over to the genome_build
to match the build of the peaklist
.
plot_chromHMM( peaklist, chromHMM_annotation, genome_build, cell_line = NULL, interact = FALSE, return_data = FALSE )
plot_chromHMM( peaklist, chromHMM_annotation, genome_build, cell_line = NULL, interact = FALSE, return_data = FALSE )
peaklist |
A named list of peak files as GRanges object. If list is not named, default names will be assigned. |
chromHMM_annotation |
ChromHMM annotation list. |
genome_build |
The human genome reference build used to generate peakfiles. "hg19" or "hg38". |
cell_line |
If not |
interact |
Default TRUE. By default, the heatmaps are interactive.
If |
return_data |
Return the plot data as in addition to the plot itself. |
ChromHMM heatmap, or a named list.
### Load Data ### data("CnT_H3K27ac") # example dataset as GRanges object data("CnR_H3K27ac") # example dataset as GRanges object ### Create Named Peaklist ### peaklist <- list(CnT=CnT_H3K27ac, CnR=CnR_H3K27ac) ### Run ### my_plot <- plot_chromHMM(peaklist = peaklist, cell_line = "K562", genome_build = "hg19")
### Load Data ### data("CnT_H3K27ac") # example dataset as GRanges object data("CnR_H3K27ac") # example dataset as GRanges object ### Create Named Peaklist ### peaklist <- list(CnT=CnT_H3K27ac, CnR=CnR_H3K27ac) ### Run ### my_plot <- plot_chromHMM(peaklist = peaklist, cell_line = "K562", genome_build = "hg19")
Plot correlation by binning genome and measuring correlation of peak quantile ranking. This ranking is based on p-value or other peak intensity measure dependent on the peak calling approach.
plot_corr( peakfiles, reference = NULL, genome_build, bin_size = 5000, keep_chr = NULL, drop_empty_chr = FALSE, method = "spearman", intensity_cols = c("total_signal", "qValue", "Peak Score", "score"), interact = FALSE, draw_cellnote = TRUE, fill_diag = NA, workers = check_workers(), show_plot = TRUE, save_path = tempfile(fileext = ".corr.csv.gz") )
plot_corr( peakfiles, reference = NULL, genome_build, bin_size = 5000, keep_chr = NULL, drop_empty_chr = FALSE, method = "spearman", intensity_cols = c("total_signal", "qValue", "Peak Score", "score"), interact = FALSE, draw_cellnote = TRUE, fill_diag = NA, workers = check_workers(), show_plot = TRUE, save_path = tempfile(fileext = ".corr.csv.gz") )
peakfiles |
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using |
reference |
A named list containing reference peak file(s) as GRanges
object. Please ensure that the reference file is listed and named
i.e. |
genome_build |
The build of **all** peak and reference files to calculate the correlation matrix on. If all peak and reference files are not of the same build use liftover_grlist to convert them all before running. Genome build should be one of hg19, hg38, mm9, mm10. |
bin_size |
Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage. |
keep_chr |
Which chromosomes to keep. |
drop_empty_chr |
Drop chromosomes that are not present in any of the
|
method |
Default spearman (i.e. non-parametric). A character string indicating which correlation coefficient (or covariance) is to be computed. One of "pearson", "kendall", or "spearman": can be abbreviated. |
intensity_cols |
Depending on which columns are present, this value will be used to get quantiles and ultimately calculate the correlations:
|
interact |
Default TRUE. By default heatmap is interactive. If FALSE, heatmap is static. |
draw_cellnote |
Draw the numeric values within each heatmap cell. |
fill_diag |
Fill the diagonal of the overlap matrix. |
workers |
Number of threads to parallelize across. |
show_plot |
Show the plot. |
save_path |
Path to save a table of correlation results to. |
list with correlation plot (corr_plot) and correlation matrix (data)
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac"=encode_H3K27ac) ## Increasing bin_size for speed here, ## but lower values will give more precise results (and lower correlations) cp <- plot_corr(peakfiles = peakfiles, reference = reference, genome_build = "hg19", bin_size = 5000, workers = 1)
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac"=encode_H3K27ac) ## Increasing bin_size for speed here, ## but lower values will give more precise results (and lower correlations) cp <- plot_corr(peakfiles = peakfiles, reference = reference, genome_build = "hg19", bin_size = 5000, workers = 1)
This function runs KEGG and GO enrichment analysis of peak files and generates dot plots.
plot_enrichment( peaklist, txdb = NULL, tss_distance = c(-3000, 3000), pvalueCutoff = 0.05, interact = FALSE, verbose = TRUE )
plot_enrichment( peaklist, txdb = NULL, tss_distance = c(-3000, 3000), pvalueCutoff = 0.05, interact = FALSE, verbose = TRUE )
peaklist |
A list of peak files as GRanges object.
Files must be listed and named using |
txdb |
A TxDb annotation object from Bioconductor. |
tss_distance |
A vector specifying the distance upstream and downstream
around transcription start sites (TSS).
The default value is |
pvalueCutoff |
P-value cutoff, passed to compareCluster. |
interact |
Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static. |
verbose |
Print messages. |
KEGG and GO dot plots
### Load Data ### data("CnT_H3K27ac") # example peakfile GRanges object data("CnR_H3K27ac") # example peakfile GRanges object ### Create Named Peaklist ### peaklist <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) enrich_res <- plot_enrichment(peaklist = peaklist, pvalueCutoff=1, tss_distance = c(-50,50))
### Load Data ### data("CnT_H3K27ac") # example peakfile GRanges object data("CnR_H3K27ac") # example peakfile GRanges object ### Create Named Peaklist ### peaklist <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) enrich_res <- plot_enrichment(peaklist = peaklist, pvalueCutoff=1, tss_distance = c(-50,50))
Plot precision-recall curves (and optionally F1 plots) by
iteratively testing for peak overlap across a series of
thresholds used to filter peakfiles
.
Each GRanges
object in peakfiles
will be used as the "query"
against each GRanges object in reference
as the subject.
Will automatically use any columns that are
specified with thresholding_cols
and present within each
GRanges object
to create percentiles for thresholding.
NOTE : Assumes that all GRanges in
peakfiles
and reference
are already
aligned to the same genome build.
plot_precision_recall( peakfiles, reference, thresholding_cols = c("total_signal", "qValue", "Peak Score"), initial_threshold = 0, n_threshold = 20, max_threshold = 1, workers = check_workers(), plot_f1 = TRUE, subtitle = NULL, color = "peaklist1", shape = color, facets = "peaklist2 ~ .", interact = FALSE, show_plot = TRUE, save_path = tempfile(fileext = "precision_recall.csv"), verbose = TRUE )
plot_precision_recall( peakfiles, reference, thresholding_cols = c("total_signal", "qValue", "Peak Score"), initial_threshold = 0, n_threshold = 20, max_threshold = 1, workers = check_workers(), plot_f1 = TRUE, subtitle = NULL, color = "peaklist1", shape = color, facets = "peaklist2 ~ .", interact = FALSE, show_plot = TRUE, save_path = tempfile(fileext = "precision_recall.csv"), verbose = TRUE )
peakfiles |
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using |
reference |
A named list containing reference peak file(s) as GRanges
object. Please ensure that the reference file is listed and named
i.e. |
thresholding_cols |
Depending on which columns are present, GRanges will be filtered at each threshold according to one or more of the following:
|
initial_threshold |
Numeric threshold that was provided to SEACR
(via the parameter |
n_threshold |
Number of thresholds to test. |
max_threshold |
Maximum threshold to test. |
workers |
Number of threads to parallelize across. |
plot_f1 |
Generate a plot with the F1 score vs. threshold as well. |
subtitle |
Plot subtitle. |
color |
Variable to color data points by. |
shape |
Variable to set data point shapes by. |
facets |
|
interact |
Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static. |
show_plot |
Show the plot. |
save_path |
File path to save precision-recall results to. |
verbose |
Print messages. |
list with data and precision recall and F1 plots
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac" = encode_H3K27ac) pr_out <- plot_precision_recall(peakfiles = peakfiles, reference = reference, workers = 1)
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac" = encode_H3K27ac) pr_out <- plot_precision_recall(peakfiles = peakfiles, reference = reference, workers = 1)
Compute precision and recall using each GRanges
object in peakfiles
as the "query"
against each GRanges object in reference
as the subject.
precision_recall( peakfiles, reference, thresholding_cols = c("total_signal", "qValue", "Peak Score"), initial_threshold = 0, n_threshold = 20, max_threshold = 1, cast = TRUE, workers = 1, verbose = TRUE, save_path = tempfile(fileext = "precision_recall.csv"), ... )
precision_recall( peakfiles, reference, thresholding_cols = c("total_signal", "qValue", "Peak Score"), initial_threshold = 0, n_threshold = 20, max_threshold = 1, cast = TRUE, workers = 1, verbose = TRUE, save_path = tempfile(fileext = "precision_recall.csv"), ... )
peakfiles |
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using |
reference |
A named list containing reference peak file(s) as GRanges
object. Please ensure that the reference file is listed and named
i.e. |
thresholding_cols |
Depending on which columns are present, GRanges will be filtered at each threshold according to one or more of the following:
|
initial_threshold |
Numeric threshold that was provided to SEACR
(via the parameter |
n_threshold |
Number of thresholds to test. |
max_threshold |
Maximum threshold to test. |
cast |
Cast the data into a format that's more compatible with ggplot2. |
workers |
Number of threads to parallelize across. |
verbose |
Print messages. |
save_path |
File path to save precision-recall results to. |
... |
Arguments passed on to
|
Overlap
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac" = encode_H3K27ac) pr_df <- precision_recall(peakfiles = peakfiles, reference = reference, workers = 1)
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac" = encode_H3K27ac) pr_df <- precision_recall(peakfiles = peakfiles, reference = reference, workers = 1)
Predict specific values of precision or recall by fitting a model to a precision-recall curve. Predictions that are <0 will automatically be set to 0. Predictions that are >100 will automatically be set to 100.
predict_precision_recall( pr_df, fun = stats::loess, precision = seq(10, 100, 10), recall = seq(10, 100, 10) )
predict_precision_recall( pr_df, fun = stats::loess, precision = seq(10, 100, 10), recall = seq(10, 100, 10) )
pr_df |
Precision-recall data.frame generated by precision_recall. |
fun |
Function to fit the data with. |
precision |
Precision values to predict recall from. |
recall |
Recall values to predict precision from. |
A named list of fitted models and predictions.
Fix for producing NAs from loess fun.
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac" = encode_H3K27ac) pr_df <- precision_recall(peakfiles = peakfiles, reference = reference) predictions <- predict_precision_recall(pr_df = pr_df)
data("CnR_H3K27ac") data("CnT_H3K27ac") data("encode_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) reference <- list("encode_H3K27ac" = encode_H3K27ac) pr_df <- precision_recall(peakfiles = peakfiles, reference = reference) predictions <- predict_precision_recall(pr_df = pr_df)
Standardise a list of peak files by rebinning them into fixd-width tiles across the genome.
rebin_peaks( peakfiles, genome_build, intensity_cols = c("total_signal", "qValue", "Peak Score", "score"), bin_size = 5000, keep_chr = NULL, sep = c(":", "-"), drop_empty_chr = FALSE, as_sparse = TRUE, workers = check_workers(), verbose = TRUE, ... )
rebin_peaks( peakfiles, genome_build, intensity_cols = c("total_signal", "qValue", "Peak Score", "score"), bin_size = 5000, keep_chr = NULL, sep = c(":", "-"), drop_empty_chr = FALSE, as_sparse = TRUE, workers = check_workers(), verbose = TRUE, ... )
peakfiles |
A list of peak files as GRanges object and/or as paths to
BED files. If paths are provided, EpiCompare imports the file as GRanges
object. EpiCompare also accepts a list containing a mix of GRanges objects
and paths.Files must be listed and named using |
genome_build |
The build of **all** peak and reference files to calculate the correlation matrix on. If all peak and reference files are not of the same build use liftover_grlist to convert them all before running. Genome build should be one of hg19, hg38, mm9, mm10. |
intensity_cols |
Depending on which columns are present, this value will be used to get quantiles and ultimately calculate the correlations:
|
bin_size |
Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage. |
keep_chr |
Which chromosomes to keep. |
sep |
Separator to be used after chromosome name (first item) and between start/end genomic coordinates (second item). |
drop_empty_chr |
Drop chromosomes that are not present in any of the
|
as_sparse |
Return the rebinned peaks as a sparse matrix
(default: |
workers |
Number of threads to parallelize across. |
verbose |
Print messages. |
... |
Arguments passed on to
|
Binned peaks matrix
data("CnR_H3K27ac") data("CnT_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) #increasing bin_size for speed peakfiles_rebinned <- rebin_peaks(peakfiles = peakfiles, genome_build = "hg19", bin_size = 5000, workers = 1)
data("CnR_H3K27ac") data("CnT_H3K27ac") peakfiles <- list(CnR_H3K27ac=CnR_H3K27ac, CnT_H3K27ac=CnT_H3K27ac) #increasing bin_size for speed peakfiles_rebinned <- rebin_peaks(peakfiles = peakfiles, genome_build = "hg19", bin_size = 5000, workers = 1)
Reconstruct the EpiCompare command used to generate the current Rmarkdown report.
report_command(params, peaklist_tidy, reference_tidy)
report_command(params, peaklist_tidy, reference_tidy)
params |
Parameters supplied to the Rmarkdown template. |
peaklist_tidy |
Post-processed target peaks. |
reference_tidy |
Post-processed reference peaks. |
String reconstructing R function call.
# report_command()
# report_command()
Generate a header for EpiCompare reports generated using the EpiCompare.Rmd template.
report_header()
report_header()
Header string to be rendering within Rmarkdown file.
report_header()
report_header()
This function filters peak files by removing peaks in blacklisted regions and in non-standard chromosomes. It also checks that the input list of peakfiles is named. If no names are provided, default file names will be used.
tidy_peakfile(peaklist, blacklist)
tidy_peakfile(peaklist, blacklist)
peaklist |
A named list of peak files as GRanges object.
Objects must be named and listed using |
blacklist |
Peakfile specifying blacklisted regions as GRanges object. |
list of GRanges object
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object data("hg19_blacklist") # blacklist region for hg19 genome ### Create Named Peaklist ### peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) ### Run ### peaklist_tidy <- tidy_peakfile(peaklist = peaklist, blacklist = hg19_blacklist)
### Load Data ### data("encode_H3K27ac") # example peakfile GRanges object data("CnT_H3K27ac") # example peakfile GRanges object data("hg19_blacklist") # blacklist region for hg19 genome ### Create Named Peaklist ### peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) ### Run ### peaklist_tidy <- tidy_peakfile(peaklist = peaklist, blacklist = hg19_blacklist)
Translate the name of a genome build from one format to another.
translate_genome( genome, style = c("UCSC", "Ensembl", "NCBI"), default_genome = NULL, omit_subversion = TRUE )
translate_genome( genome, style = c("UCSC", "Ensembl", "NCBI"), default_genome = NULL, omit_subversion = TRUE )
genome |
A character vector of genomes equivalent to UCSC version or Ensembl Assemblies |
style |
A single value equivalent to "UCSC" or "Ensembl" specifying the output genome |
default_genome |
Default genome build when |
omit_subversion |
Omit any subversion suffixes after the ".". |
Standardized genome build name as a character string.
genome <- translate_genome(genome="hg38", style="Ensembl") genome2 <- translate_genome(genome="mm10", style="UCSC")
genome <- translate_genome(genome="hg38", style="Ensembl") genome2 <- translate_genome(genome="mm10", style="UCSC")
This function generates a plot of read count frequency around TSS.
tss_plot( peaklist, txdb = NULL, tss_distance = c(-3000, 3000), conf = 0.95, resample = 500, interact = FALSE, workers = check_workers() )
tss_plot( peaklist, txdb = NULL, tss_distance = c(-3000, 3000), conf = 0.95, resample = 500, interact = FALSE, workers = check_workers() )
peaklist |
A list of peak files as GRanges object.
Files must be listed and named using |
txdb |
A TxDb annotation object from Bioconductor. |
tss_distance |
A vector specifying the distance upstream and downstream
around transcription start sites (TSS).
The default value is |
conf |
Confidence interval threshold estimated by bootstrapping
( |
resample |
Number of bootstrapped iterations to run. Argument passed to plotAvgProf. |
interact |
Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static. |
workers |
Number of cores to parallelise bootstrapping across. Argument passed to plotAvgProf. |
A named list of profile plots.
### Load Data ### data("CnT_H3K27ac") # example peaklist GRanges object data("CnR_H3K27ac") # example peaklist GRanges object ### Create Named Peaklist ### peaklist <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) my_plot <- tss_plot(peaklist = peaklist, tss_distance=c(-50,50), workers = 1)
### Load Data ### data("CnT_H3K27ac") # example peaklist GRanges object data("CnR_H3K27ac") # example peaklist GRanges object ### Create Named Peaklist ### peaklist <- list("CnT"=CnT_H3K27ac, "CnR"=CnR_H3K27ac) my_plot <- tss_plot(peaklist = peaklist, tss_distance=c(-50,50), workers = 1)
This function creates boxplots showing the distribution of widths in each peak file.
width_boxplot(peaklist, interact = FALSE)
width_boxplot(peaklist, interact = FALSE)
peaklist |
A list of peak files as GRanges object.
Files must be named and listed using |
interact |
Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static. |
A boxplot of peak widths.
### Load Data ### data("encode_H3K27ac") # example peaklist GRanges object data("CnT_H3K27ac") # example peaklist GRanges object peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) my_plot <- width_boxplot(peaklist = peaklist)
### Load Data ### data("encode_H3K27ac") # example peaklist GRanges object data("CnT_H3K27ac") # example peaklist GRanges object peaklist <- list("encode"=encode_H3K27ac, "CnT"=CnT_H3K27ac) my_plot <- width_boxplot(peaklist = peaklist)
Write example peaks datasets to disk.
write_example_peaks( dir = file.path(tempdir(), "processed_results"), datasets = c("encode_H3K27ac", "CnT_H3K27ac", "CnR_H3K27ac") )
write_example_peaks( dir = file.path(tempdir(), "processed_results"), datasets = c("encode_H3K27ac", "CnT_H3K27ac", "CnR_H3K27ac") )
dir |
Directory to save peak files to. |
datasets |
Example datasets from EpiCompare to write. |
Named vector of paths to saved peak files.
save_paths <- EpiCompare::write_example_peaks()
save_paths <- EpiCompare::write_example_peaks()